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<3 I. INTRODUCTION 

<N ; 

. A. Preamble 

The connection between the statistical physics of disordered systems and optimization problems in computer science 
' dates back from twenty years at least [43]. After all zero temperature statistical physics is simply the search for the 
t-H . state with minimal energy, while the main problem in combinatorial optimization is to look for the configurations of 
parameters minimizing some cost function (the length of a tour in the traveling salesman problem (TSP), the number 
' (-h ', of violated constraints in constrained satisfaction problems, ...) (57j . Yet, despite the beautiful studies of the average 
• properties of the TSP, Graph partitioning, Matching, based on the recently developed mean-field spin glass theory 
[43j |. a methodological gap between the fields could not be bridged 30]. In statistical physics statements are usually 
made on the properties of samples given some quenched disorder distribution such as the typical number of solutions, 
minimal energy ... In optimization, however, one is interested in solving one (or several) particular instances of a 
problem, and needs efficient ways to do so, that is, requiring a computational effort growing not too quickly with the 
number of data defining the instance. Knowing precisely the typical properties for a given, academic distribution of 
| ; , instances does not help much to solve practical cases. 

At the beginning of the nineties practitionners in artificial intelligence realized that classes of random constrained 
satisfaction problems used as artificial benchmarks for search algorithms exhibited abrupt changes of behaviour when 
some control parameter were finely tuned [47| . The most celebrated example was random K-Satisfiability, where 
one looks for a solution to a set of random logical constraints over a set of Boolean variables. It appeared that, for 
large sets of variables, there was a critical value of the number of constraints per variable below which there almost 
surely existed solutions, and above which solutions were absent. An important feature was that search algorithms 
performances drastically worsened in the vicinity of this critical ratio. 

This phenomenon, strongly reminiscent of phase transitions in condensed matter physics, led to a revival of the 
interface between statistical physics and computer science, which has not vanished yet. The purpose of the present 
lecture is to introduce the non specialist reader to the concepts and techniques required to understand the literature 
I in the field. For the sake of simplicity the presentation will be limited to one computational problem, namely, 
. linear systems of Boolean equations. A good reason to do so is that this problem concentrates most of the features 
encountered in other optimization problems, while being technically simpler to study. In addition it is closely related 
T^-j- , to error-correcting codes in communication theory, see lectures by A. Montanari and R. Urbanke in the present book. 
f^S ' Extension to other problems will be mentioned in the conclusions. 

t***'- \ The lecture is divided into three parts. Sections 1 and 2 are devoted to the presentation of the model and of 
elementary concepts related to phase transitions e.g. finite-size scaling, large deviations, critical exponents, symmetry 
breaking, ... Sections 3 and 4 expose the specific statistical mechanics techniques and concepts developed in disordered 
systems to deal with highly interacting and random systems, namely the replica and cavity approaches. Finally Section 
5 focuses on dynamics and the study of search algorithms. 



B. Linear systems of Boolean equations 

Linear systems of Boolean equations look very much like their well known counterparts for integer- valued variables, 
except that equalities are defined modulo two. Consider a set of N Boolean variables Xi with indices i — 1, . . . , N. 
Any variable shall be False (F) or True (T). The sum of two variables, denoted by +, corresponds to the logical 
exclusive OR between these variables defined through, 



F+T = T+F=T 
F + F = T + T = F 



(1) 
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In the following we shall use an alternative representation of the above sum rule. Variables will be equal to or 1, 
instead of F or T respectively. Then the + operation coincides with the addition between integer numbers modulo 
two. 

The following is a linear equation involving three variables, 

Xi + x 2 + x 3 = 1 . (2) 

Four among the 2 3 = 8 assignments of {x\, x%, X3) satisfy the equation: (1,0,0), (0,1,0), (0,0,1) and (1,1,1). A 
Boolean system of equations is a set of Boolean equations that have to be satisfied together. For instance, the 
following Boolean system involving four variables 

x\ + x 2 + x 3 = 1 

x 2 + X4 = (3) 

X\ + X4 = 1 

has two solutions: (xi, x 2 , x 3 , xa) = (1, 0, 0, 0) and (0, 1, 0, 1). A system with one or more solutions is called satishable. 
A trivial example of an unsatisfiablc Boolean system is 

f xi + x 2 + x 3 = 1 ... 
\x 1 +x 2 +x 3 = ■ { > 

Determining whether a Boolean system admits an assignment of the Boolean variables satisfying all the equations 
constitutes the XORSAT (exclusive OR Satisfaction) problem. In the following, we shall restrict for some reasons 
to be clarified in Section [TT| to K-XORSAT, a variant of XORSAT where each Boolean equation include K variables 
precisely. 

K-XORSAT belongs to the class P of polynomial problems [57j ■ Determining whether a system is satisfiable or 
not can be achieved by the standard Gaussian elimination algorithm in a time (number of elementary operations) 
bounded from above by some constant times the cube of the number of bits necessary to store the system 65] [57]. 

If the decision version of K-XORSAT is easy its optimization version is not. Assume you are given a system F, run 
the Gauss procedure and find that it is not satisfiable. Determining the maximal number Ms(F) of satisfiable equations 
is a very hard problem. Even approximating this number is very hard. It is known that there is no approximation 
algorithm (unless P=NP) for XORSAT with ratio r > ~, that is, guaranteed to satisfy at least r x Ms(F) equations 
for any F. But r = \ is achieved, on average, by making a random guess [66|! 



C. Models for random systems 



There are many different ways of generating random Boolean systems. Perhaps the simplest one is the following, 
called fixed-size ensemble. To build an equation we pick up uniformly at random K distinct indices among the N 
ones, say, ii,i2 and i/-. Then we consider the equation 

x i 1 + x.i 2 + . . . + Xi k = v . (5) 

The second member, v, is obtained by tossing a coin: v = or v = 1 with equal probabilities (one half) and 
independently of the indices of the variables in the first member. The process is repeated M times, without correlation 
between equations to obtain a system with M equations. 

Another statistical ensemble is the fixed-probability ensemble. One scans the set of all H — 2(^) equations one 
after the other. Each equation is added to the system with probability p, discarded with probability 1 — p. Then a 
system with, on average, pH equations (without repetition) is obtained. In practice one chooses p = 41 to have the 
same (average) number of equations as in the fixed-size ensemble. 

The above distributions are not the only possible ones. However they are easy to implement on a computer, are 
amenable to mathematical studies, and last but not least, lead to a surprisingly rich phenomenology. One of the key 
quantities which exhibits an interesting behaviour is 

Psat{N, a) — Probability that a system of random K-XORSAT with 
N variables and M — aN equations is satisfiable , 

which obviously depends on K and the statistical ensemble. Given N Psat is a decreasing function of a. We will 
see that, in the infinite size limit (and for K > 2), the decrease is abrupt at some well defined ratio, defining a 
phase transition between Satisfiable and Unsatisfiable phase [l8|. The scope of the lecture is to give some tools to 
understand this transition and some related phenomena. 
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II. BASIC CONCEPTS: OVERVIEW OF STATIC PHASE TRANSITIONS IN K-XORSAT 



In this Section we introduce the basic concepts necessary to the study of random K-XORSAT. It turns out that 
even the K = 1 case, trivial from a computer science point of view (each equation contains a single variable!), can 
be used as an illustration to important concepts such as scaling and self-averageness. Ideas related to the percolation 
phase transition and random graphs are illustrated on the K = 2 case. Finally the solution space of 3-XORSAT 
model exemplifies the notion of clusters and glassy states. 



A. Finite-size scaling (I): scaling function 



FigureQJleft) shows the probability Psat that a randomly extracted 1-XORSAT formula is satisfiable as a function 
of the ratio a, and for sizes N ranging from 100 to 1000. We see that Psat is a decreasing function of a and N. 

Consider the subformula made of the rij equations with first member equal to Xj . This formula is always satisfiable 
if m = or n.i = 1. If rii > 2 the formula is satisfiable if and only if all second members are equal (to 0, or to 1), 
an event with probability (^) ni_1 decreasing exponentially with the number of equations. Hence we have to consider 
the following variant of the celebrated Birthday problem |67|. Consider a year with a number N of days, how should 
scale the number M of students in a class to be sure that no two students have the same birthday date? 



P = 



M-l 

n 

i=0 



1 - 



N 



= exp 



M(M-l) 
2N 



0{M 3 /N 2 ) 



(6) 



Hence we expect a cross-over from large to small p when M crosses the scaling regime v^V- Going back to the 
1-XORSAT model we expect Psat to have a non zero limit value when the number of equations and variables are 
both sent to infinity at a fixed ratio y = M/VN. In other words, random 1-XORSAT formulas with N variables, 
M equations or with, say, 100 x N variables, 10 x M equations should have roughly the same probabilities of being 
satisifiable. To check this hypothesis we replot the data in Figure [1] after multiplication of the abscissa of each point 
by ^/N (to keep y fixed instead of a). The outcome is shown in the right panel of Figure [T] Data obtained for various 
sizes nicely collapse on a single limit curve function of y. 

The calculation of this limit function, usually called scaling function, is done hereafter in the fixed-probability 
1-XORSAT model where the number of equations is a Poisson variable of mean value M = yvN. We will discuss 
the equivalence between the fixed-probability and the fixed-size ensembles later. In the fixed-probability ensemble 
the numbers ni of occurence of each variable Xj are independent Poisson variables with average value M/N = y/ \/~N. 
Therefore the probability of satisfaction is 



P p SAT (N,a=J=) = 



-v/Vn 



(y/VN) n (I 



2e 



-y/(2y/N) 



,,-v/Vn 



n! 

JV 



N 



where the p subscript denotes the use of the fixed-probability ensemble. We obtain the desired scaling function 

^(y) = lim \nP p SAT (N,a = - *) = -!£, 



(7) 



(8) 



in excellent agreement with the rescaled data of Figure [T] (right) [19| . 



B. Self-averageness of energy and entropy 



Let us now consider random 1-XORSAT formulas at a finite ratio a, and ask for the distribution of the minimal 
fraction of unsatisfied equations, hereafter called ground state (GS) energy ecs- For simplicity we work in the fixed- 
probability ensemble again. The numbers n®,nj of, respectively, x, = 0, Xj = 1 are independent Poisson variables 
with mean ^. The minimal number of unsatisfied equations is clearly min(n°, nj). The GS energy is the sum (divided 
by M) of iV such i.i.d. variables; from the law of large number it almost surely converges towards the average value 



eas(a) = - (l - e- a I (a) - e^'I^a)) , 



(9) 
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FIG. 1: Left: Probability that a random 1-XORSAT formula is satisfiable as a function of the ratio a of equations per variable, 
and for various sizes N. Right: same data as in the left panel after the horizontal rescaling a — > a x \/iV = y\ note the use of 
a log scale for the vertical axis. The dashed line shows the scaling function $i(y) ©. 



where Ii denotes the £ th modified Bessel function. In other words almost all formulas have the same GS energy in 
the infinite N limit, a property called sclf-averageness in physics, and concentration in probability. 

How many configurations of variables realize have minimal energy? Obviously a variable is free (to take or 1 
value) if = nj, and is frozen otherwise. Hence the number of GS configurations is J\f = 2 Nf where Nf is the 
number of free variables. Call 

' = £ e - Q (f) n j^ = e ^ h{a) (10) 

n>0 v 

the probability that a variable is free. Then JVy is a binomial variable with parameter p among N; it is sharply 
concentrated around Nf — p N with typical fluctuations of the order of N 1 / 2 . As a consequence, the GS entropy per 
variable, sgs = Qogftf)/N, is self-averaging and almost surely equal to its average value sgs = pk>g2. 

Self-averageness is the very useful property. It allows us to study the average value of a random variable, instead of 
its full distribution. We shall use it in Section ULTl and also in the analysis of algorithms of Section IV El This property 
is not restricted to XORSAT but was proven to hold for the GS energy [l3| and entropy [52j of other optimization 
problems. 

Not all variables are self-averaging of course. A straightforward example is the number Af of GS configurations 
itself. Its q th moment reads N q = (1 — p + p 2 q ) N where the overbar denotes the average over the formulas. We see 
that N q ^> (N) q : Af exhibits large fluctuations and is not concentrated around its average. Very rare formulas with 
atypically large number Nf of free variables contribute more to the q th moment than the vast majority of formulas, 
and spoil the output. This is the very reason we will need the introduction of the replica approach in Section HTT1 



C. Large deviations for P S at (I): 1-XORSAT 

As we have seen in the previous sections 1-XORSAT formulas with a finite ratio a are unsatifiable with high 
probability i.e. equal to unity in the infinite N limit. For finite but large N there is a tiny probability that a 
randomly extracted formula is actually satisifiable. A natural question is to characterize the 'rate' at which Psat 
tends to zero as N increases (at fixed a). Answering to such questions is the very scope of large deviation theory 
(see O for an elementary introduction) . Looking for events with very small probabilities is not only interesting from 
an academic point of view, but can also be crucial in practical applications. We will see in Section IVCl that the 
behaviour of some algorithms is indeed dominated by rare events. 

Figure [5] shows minus the logarithm of Psat, divided by N, as a function of the ratio a and for various sizes N. 
Once again the data corresponding to different sizes collapse on a single curve, meaning that 

P S at{N, a) = e~ N "i(«)+ W . (11) 

Decay exponent u)\ is called rate function in probability theory. We can derive its value in the fixed-probability 
ensemble from (JT]) with y = a x \Z~N, with the immediate result 

d>(a) = a - In (2 e a ' 2 - l) . (12) 

The agreement with numerics is very good for small ratios, but deteriorates as a increases. The reason is simple. In 
the fixed-probability ensemble the number M of equations is not fixed but may fluctuate around the average value 
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I-XORSAT 




FIG. 2: Same data as FigurcfT] (left) with: logarithmic scale on the vertical axis, and rescaling by —1/N. The scaling functions 
uii (|14[) and uf (112[1 for, respectively, the fixed-size and fixed-probability ensembles are shown. 



M — aN. The ratio a = M/N, is with high probability equal to a, but large deviations (a ^ a) are possible and 
described by the rate function [68j|. 

f2(<S|a) = a — a — a ln(a/a) . (13) 

However the probability that a random 1-XORSAT formula with M equations is satisfiable is also exponentially small 
in N, with a rate function uJ\{a) increasing with a. Thus, in the fixed-probability ensemble, a trade-off is found 
between ratios a close to a (formulas likely to be generated) and close to (formulas likely to be satisfiable). As a 
result the fixed-probability rate function is 

wf(a) = min Li(<5) + Q(a|a)l , (14) 

a 

and is smaller than uii(a). It is an easy check that the optimal ratio a* = a/(2 — e~"/ 2 ) < a as expected. Inverting 
(|14|) we deduce the rate function u>i in the fixed-size ensemble, in excellent agreement with numerics (Figure [2]) ■ This 
example underlines that thermodynamically equivalent ensembles have to be considered with care as far as rare events 
are concerned. 

Remark that, when a — > 0, a = a + 0(a 2 ), and to\(a) — u>i(a) + 0(a 3 ). This common value coincides with the 
scaling function — $i(a) (fSj> . This identity is expected on general basis (Section IIIFj) and justifies the agreement 
between the fixed-probability scaling function and the numerics based on the fixed-size ensemble (Figure [TJ right). 



D. Percolation in random graphs 



Though 1-XORSAT allowed us to understand some general features of random optimization problems it is very 
limited due to the absence of interactions between variables. A more interesting problem is 2-XORSAT where every 
equation define a joint constraint on two variables. Formulas of 2-XORSAT can be represented by a graph with TV 
vertices (one for each variable), and aN edges. To each equation of the type Xi + Xj = e corresponds an edge linking 
vertices i and j, and carrying or 1 label (the value e of the second member). Depending on the input model chosen 
(Section II CI) multiple edges are present or not. 

As the formula is random so is graph. Figure [3] shows examples of graphs obtained for various values of a. Notice 
the qualitative change of structure of graphs when the ratio a varies from low values (graphs are mostly made of 
small isolated trees) to higher ones (a large part of vertices are now connected together). This change is known as 
the percolation transition in physics, or the appearance of a giant component in mathematics literature. 

Before reviewing some of the aspects of the percolation transition let us mention an important fact on the valency 
of vertices. As a result of the randomness of the graph generation process, each node share edges with a variable 
number of neighboring vertices. In the large N limit the degree v of a vertex, i.e. the number of its neighbors, is a 
Poisson variable with mean 2a, 

ProbaH = er 2a ^1 . (15) 
vl 

For instance the fraction of isolated vertices is e~ 2a . The average degree of a vertex, c = 2a, is called connectivity. 

It is natural to decompose the graphs into its connected subgraphs, called components. Erdos and Renyi were able 
in 1960 to characterize the distribution of sizes of the largest component [l2T |. 
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ABC 



FIG. 3: Examples of random graphs generated at fixed number M = aN of edges (fixed-size model without repetition). All 
graph include N = 20 vertices (grey dots). The average degree of valency, 2a, is equal to 0.5 (A), 1 (B), and 2 (C). The labels 
of the vertices have been permuted to obtain planar graphs, i.e. avoid crossing of edges. 



• When c < 1 , the largest component includes ~ In N / (c— 1 — In c) vertices with high probability. Most components 
include only a finite number of vertices, and are trees i.e. contain no circuit. 

• For c = 1 the largest component contain 0(N 2 / 3 ) vertices. 

• When c > 1 there is one giant component containing ~ 7(c)iV vertices; the others components are small i.e. 
look like the components in the c < 1 regime. The fraction of vertices in the giant component is the unique 
positive solution of 

l-7 = e~ C7 . (16) 
It is a non analytic function of c, equal to for c < 1, and positive above, tending to unity when c increases. 

The phenomenon taking place at c = 1 is an example of (mean-field) percolation transition. We now give a hand- 
waving derivation of (jTHJ) . Consider a random graph G over N vertices, with connectivity c. Add a new vertex A to 
the graph to obtain G' . If we want G' to be drawn from the same distribution as G, a number v of edges must be 
attached to A, where v an integer-valued random number following the Poisson distribution (|15[) . After addition of 
A, some connected components of G will merge in G' . In particular, with some probability p v , A will be part of the 
giant component of G' . To estimate p v , we note that this event will not happen if and only if none of the v neighbors 
of A in G' belongs to the giant component of G. Thus, 

1- Pv = (l~j) v , (17) 

where 7 is the size (fraction of vertices) of the giant component. Summing both sides of l|17p over the distribution 
(|15p for v, and asserting that the change in size of the giant component between G and G' is o(l) for large N, we 
obtain (fT6|) . 

The above derivation illustrates an ubiquitous idea in probability and statistical physics, which could be phrased 
as follows: 'if a system is very large, its statistical properties should be, in some sense, unaffected by a small increase 
in size'. This idea will be useful, in a more sophisticated context, in Section ITVl 



E. Sat/Unsat transition in 2-XORSAT 

Figure 0] shows the probability Psat that a randomly extracted 2-XORSAT formula is satisfiable as function of 
a, and for various sizes N. It appears that Psat drops quickly to zero for large N when a reaches the percolation 
threshold a c = i. For ratios smaller than a c the probability of satisfaction is positive, but smaller than unity. 

Take a < 5. Then the random graph G associated to a random 2-XORSAT formula is non percolating, and made 
of many small components. Identical components (differing only by a relabelling of the variables) may appear several 
times, depending on their topology. For instance consider a connected graph G' made of E edges and V vertices. The 
average number of times G' appears in G is a function of E and V only, 

Ne ' v= {v){n) { 1 -n) < 18 > 

since any vertex in G 1 can establish edges with other vertices in G", but is not allowed to be connected to any of the 
N — V outside vertices. When is very large compared to E, V we have 

N E , V ~ N v - E e~^ v . (19) 
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FIG. 4: Probability that a random 2-XORSAT formula is satisfiable as a function of the ratio a of equations per variable, and 
for various sizes N. The full line is the asymptotic analytical formula (|23p . 



Three cases should distinguished, depending on the value of V — E: 

• V — E = 1: this is the largest value compatible with connectedness, and corresponds to the case of trees. From 
(fT!?]) every finite tree has of the order of N copies in G. 

• V — E = 0: this correspond to trees with one additional edge, that is, to graphs having one cycle (closed loop). 
The average number of unicyclic graphs is, from (|19|) . finite when N — > oo. 

• V — E < —1: the average number of components with more than one cycle vanishes in the large N limit; those 
graphs are unlikely to be found and can be ignored [69j. 

Obviously a 2-XORSAT formula with tree structure is always satisfiable (70|. Hence dangerous subformulas, as far 
as satisfiability is concerned, are associated to unicyclic graphs. A simple thought shows that a unicyclic formula is 
satisfiable if and only if the number of edges carrying label 1 along the cycle is even. Since the values attached to 
the edges (second members in the formula) are uncorrelated with the topology of the subgraph (first members) each 
cycle is satisfiable with probability one half. We end up with the simple formula 



P S AT(N,a) = (2-°^) 



(20) 



where C(G) denotes the number of cycles in G, and (.) the average over G. For a reason which will become clear 
below let us classify cycles according to their length L. How many cycles of length L can we construct? We have to 
choose first L vertices among N, and join them one after the order according to some order. As neither the starting 
vertex nor the direction along the cycle matter, the average number of L-cycles is 



N(N-1)...(N-L + 1) 
Nl = 2L X 



Ar = 



2L 



(21) 



when N — > oo. As the emergence of a cycle between L vertices is a local event (independent of the environment) 
we expect the number of L-cycles to be Poisson distributed in the large N limit with parameter A^. This statement 
can actually be proven, and extended to any finite collection of cycles of various lengths in the infinite size limit, 
the joint distribution of the numbers of cycles of lengths 1, 2, ... ,L is the product of Poisson laws with parameters 
Ai, A2, . . . , Al calculated in (l2"Tj) . The probability of satisfaction (|20p therefore converges to 



lira P S AT(N,a) 

N—>oo 



n e 



L>L„ 



C>0 



(Al/2) 
C! 



n 

L>Ln 



-Ai/2 



(22) 



where Lo is the minimal cycle length. In normal random graphs Lq = 3 since triangles are the shortest cycles. However 
in our 2-XORSAT model any equation, or more precisely, any first member can appear twice or more, hence Lq = 2. 
We conclude that [19( 



lim Psat(N, a) = e a l 2 (1 - 2a)* when a < a c 

N— >oo 



(23) 



The agreement of this result with the large size trend coming out from numerical simulations is visible in Figure 2] 
As Psat is a decreasing function of a it remains null for all ratios larger than a c . The non analyticity of Psat at a c 
locates the Sat/Unsat phase transition of 2-XORSAT. 
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It is an implicit assumption of statistical physics that asymptotic results of the kind of (f23|) , rigorously valid in the 
N — > oo limit, should reflect with good accuracy the finite but large N situation. An inspection of Figure 2] shows this 
is indeed the case. For instance, for ratio a = .3, ([23]) cannot be told from the probability of satisfaction measured 
for formulas with N — 100 variables. This statement does not hold for a = .4, where the agreement between infinite 
size theory and numerics sets in when N = 1000 at least. It appears that such finite-size effects become bigger and 
bigger as a gets closer and closer to the Sat/Unsat threshold. This issue, of broad importance in the context of phase 
transitions and the pratical application of asymptotic results, is studied in Section III HI 



F. Large deviations for Psat (II): bounds in the Unsat phase of 2-XORSAT. 

Consider ratios a > a c . The giant components of the corresponding formulas contain an extensively large number 
of independent cycles, so we expect from (|2T)|) that the probability of satisfaction is exponentially small in N, Psat — 
exp(— iVa^a) + o(N)). Lower and upper bounds to the rate function 0J2 can be obtained from, respectively, the first 
and second moment inequalities described in |5J Denoting by Af the number of solutions of a formula Psat is the 
probability that Af > 1, and is bracketed according to (|B2|) . 

To calculate the first moment of Af remark that an equation is satisfied by one half of the configurations. This 
result remains true for a restricted set of configurations when we average over the possible choices of (the second 
member of) the equation. The average number of solutions is thus 2 /2 , from which we get 

w 2 (ot) > (a- l)ln2 . (24) 

This lower bound is useless for a < 1 since LO2 is positive by definition. As for the upper bound we need to calculate 
the second moment (Af 2 ) of Af. As equations are independently drawn 

(N 2 )=Y J l{X,Y) M (25) 

X,Y 

where the sum is carried out over the pairs X, Y of configurations of the N variables, and q(X, Y) is the probability 
that both X and Y satisfies the same randomly drawn equation, q can be easily expressed in terms of the Hamming 
distance d between X and Y, defined as the fraction of variables having opposite values in X and Y. The general 
expression for K-XORSAT is [Si 

q(d) = \(l-(l-2d) K ) (26) 

and we specialize in this section to K = 2. Going back to ([23]) we can sum over Y at fixed X, that is, over the 
distances d taking multiple values of jj with the appropriate binomial multiplicity, and then sum over X with the 
result 

(AA 2 ) = 2 N ( q(d) M = exp(7V max A(d, a) + o(N)) (27) 
d 

in the large N limit, where 

A{d,a) = (2a- 1) In 2- dlnd- (1 - d)ln(l - d) + alnq(d) . (28) 

For a < I the maximum of A is located in d* = |, and equal to A* =0. When a > h, A has two global maxima 
located in d*(a) < \ and 1 — d*(a), with equal value A* (a) > 0. 

We plot in Figure [5] the lower (|2~4"1) and upper bounds to the rate function, 

u 2 (a) <2(l-a)ln2- max A(d,a) (29) 

de[0;l] 



from (|B2j) . At large ratio both bounds asymptotically match, proving that uj2(a) = (a — l)ln2 + 0(e ). As the 



ratio departs from its threshold value by e = a — a c the upper bound grows quadratically, A*(a c + e) ~ |e 2 + 0(e 3 ). 



Numerics suggest that the increase of the rate function is slower, 

w 2 (a c + e)~r!e 3 + 0(e 4 ) , (30) 

for some constant fl ~ 1 (Figure [5]). We will see in Section Hill that a sophisticated statistical physics technique, called 
the replica method, actually predict this scaling with Q, = ||. Actually the rate function can be estimated with the 
replica approach for any ratio a with the result shown in Figure [5] 
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FIG. 5: Rate function 102(a) associated to the probability of satisfaction of 2-XORSAT formulas with ratio a. The dotted line 
is the upper bound (|28[) and the dot-dashed line the lower bound (|24[1 . The full line is the output of the replica calculation of 
Section Till Dl squares and circles represent numerical results for N = 200, 100 respectively from 10 6 formulas. Inset: Psat as 
a function of the size N at the Sat/Unsat ratio. The slope — (|39[) is shown for comparison. 



G. Order parameter and symmetry breaking 

What is the meaning of the Hamming distance d* (a) appearing in the calculation of the second moment of the 
number of solutions? An easy guess would be the average distance between pairs of solutions 

, / x ,. / ^xv solutions of f Y) 

d av (a) = hmj ^ ) F (31) 

where the average is taken over the satisfiable formulas F with ratio a, and d(X, Y) denotes the (intensive) Hamming 
distance between two solutions X, Y. However an inspection of the calculation of Section Til Fl shows that 

,Y solutions of f 

d (a) = hm^ > ^d av (a). (32) 

Actually, though d*(a) is not the average distance between solutions with the unbiased distribution over formulas, it 
is the average distance for a biased distribution where each formula is weighted with 

N{Ff 

W{F) = E^W (33) 

as can be readily checked upon insertion of w(F) in the numerator of (f3"Tj) . We will see in Section (jlllj) how to calculate 
average properties with the unbiased measure. 

Even so definition (|32|) (and (f3Tj) too) is sloppy. If X is a solution so is —X, the configuration where variables values 
are flipped. Thus the average distance, whatever the weights over formulas, is equal ^ for any N\ The difficulty comes 
from the ambiguity in how the thermodynamic limit is taken, and is the signature of spontaneous symmetry breaking. 
In the low temperature phase of the Ising model the magnetization is either m* > or — m* < if an external field 
h with, respectively, positive or negative vanishing amplitude is added prior to taking the infinite size limit. In the 
present case what plays the role of the field is a coupling between solutions as is well-known in splin-glass theory [59l | . 
Inserting exp[— N h d(X, Y)] in the numerator of (f3"2"|) we obtain, when — * 00, d* if h ^ + and 1 — d* if h — > CP. 
The density /i of probability of distances d between solutions, with the biased measure (|33|) . is concentrated below 
the Sat/Unsat threshold, 

fi(d) =5(d-~) for a<a C: (34) 
and split into two symmetric peaks above the critical ratio, 

H(d) = — d*) +^S(d- (1- d*)) for a>a c . (35) 



The concept of spontaneous symmetry breaking will play a key role in our study of 3-XORSAT (Section IIV C|l . 
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FIG. 6: Probability that a random 3-XORSAT formula is satisfiable as a function of the ratio a of equations per variable, and 
for various sizes N. The dotted line locates the threshold a c — 0.918. 



H. Finite-size scaling (II): critical exponents 

Let us summarize what we have found about the probability of satisfying random 2-XORSAT formulas in Section 
III El and III Fl Close to the transition we have from |23|) and (|30|) . 



, . / iln(-e) when e < 0, N -> oo 

\nP SAT (N,a c + e)d [ + QNe3 ^ e>0>JV>1 



The lesson of Section lll Al is that In Psat may have a non trivial limit when N — > oo, e — * provided we keep y = e AT* 1 
constant. For 1-XORSAT the exponent ip was found to be equal to i, and lnPs^y to converge to the scaling function 
&i(y) ©■ The situation is similar but slightly more involved for 2-XORSAT. A natural assumption is to look for the 
existence of a scaling function such that 

\nP SAT {N,e) ~ NP $ 2 (eA^) . (36) 



Let us see if (|36|) is compatible with the limiting behaviours (136)) . Fixing e < and sending N — > oo we obtain, for 
y = eN^ -> -oo, ± In \y\ - | In AT for the l.h.s, and iV x $ 2 (y) for the r.h.s. Hence p = as in the 1-XORSAT case, 
but an additive correction is necessary, and we modify scaling Ansatz (|36| into 

In Psat(N, e) ~ $ 2 (y = e AT^) - £ InN. (37) 

The above equation is now compatible with (|36|) if $ 2 (y) ~ j In |y| when y — > — oo. Fixing now e > and sending A^ 
to infinity we see that (|3"6")l is fulfilled if < i ) 2 (y) ~ — f2 y 3 when y — > +oo and 

V = i . (38) 

The above value for tp is expected from the study of random graphs [l2j and is related to the size 

the largest components at the percolation threshold (Section IIIDj) . ip is called critical exponent and characterize the 

width of the critical region of 2-XORSAT. Loosely speaking it means that a formula of with N variables and y + A 

equations is 'critical' when A ~ iV§. This information will be useful for the analysis of search algorithms in Section 

EH 

A consequence of (137I38[) is that, right at the threshold, the probability of satisfaction decays as|72j| 



SAT 



(N,a c )~N--2. (39) 



This scaling agrees with numerical experiments, though the small value of the decay exponent makes an accurate 
check delicate (Inset of Figure [5]) . 



I. First and second moments inequalities for the 3-XORSAT threshold 

Figure [S] shows the probability that a random 3-XORSAT formula is satisfiable as a function of a for increasing 
sizes N. It appears that formulas with ratio a < a c ~ 0.92 are very likely to be satisfiable in the large A^ limit, while 
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formulas with ratios beyond this critical value are almost surely unsatisfiable. This behaviour is different from the 
2-XORSAT case (Figure 2]) in that Psat seems to tend to unity below threshold. 

It is important to realize that, contrary to the 2-XORSAT case, the Sat/Unsat transition is not related to connec- 
tivity percolation. Consider indeed a variable, say, x\. This variable appear, on average, in 3a equations. Each of 
those equations contain other 2 variables. Hence the 'connectivity' of X\ is c = 6a, which is larger than unity for 
a p = g. In the range [a p , a c ] the formula is percolating but still satisfiable with high probability. The reason is that 
cycles do not hinder satisfiability as much as in the 2-XORSAT case. 

Use of the first and second moment inequalities ((Bj for the number Af of solutions provides us with upper and lower 
bounds to the Sat/Unsat ratio a c . The calculation follows the same line as the one of the 2-XORSAT case (Section 
III F|) . The first moment (Af) = 2 Ar ( 1-Q ) vanishes for ratios larger than unity, showing that 

a c < an = 1 . (40) 

This upper bound is definitely larger than the true threshold from the numerical findings of Figure [Sj We have 
already encountered this situation in 2-XORSAT: in the ^ < a < 1 range formulas are unsatisfiable with probability 
one (when N — > oo), yet the average number of solutions is exponentially large! The reason is, once more, that the 
average result is spoiled by rare, satisfiable formulas with many solutions. 

As for the second moment expression (|27I28|) still holds with q(d) given by (|26p with K = 3. The absolute maximum 
of the corresponding function A(d, a) is located in d* = | when a < a2 — 0.889, and d* < ^ when a > a 2 - In the latter 
case (AT 2 ) is exponentially larger than (Af) 2 , and the second moment inequality (|B2[) does not give any information 
about Psat- In the former case (Af 2 ) and (Af) 2 are equivalent to exponential-in-A order. It is shown in[C]that their 
ratio actually tends to one as N — ► oo. We conclude that formulas with ratios of equations per variable less than a 2 
are satisfiable with high probability in the infinite size limit, or, equivalently [20l |. 

a c > a 2 ~ 0.889 . (41) 

Unfortunately the lower and upper bounds do not match and the precise value of the threshold remains unknown at 
this stage. We explain in the next section how a simple preprocessing of the formula, before the application of the 
first and second moment inequalities, can close the gap, and shed light on the structure of the space of solutions. 



J. Space of solutions and clustering 

We start from a simple observation. Assume we have a formula F of 3-XORSAT where a variable, say, x, appears 
only once, that is, in one equation, say, E : x + y + z = 0. Let us call F' the subformula obtained from F after 
removal of equation E. Then the following statement is true: F is satisfiable if and only if F' is satisfiable. The proof 
is obvious: whatever the values of y, z required to satisfy F' equation E can be satisfied by an adequate choice of x, 
and so can be the whole formula F. 

In a random 3-XORSAT formula F with ratio a there are about N x 3a e~ 3a variables appearing only once in 
the formula. Removal of those variables (and their equations) produces a shorter formula with O(N) less equations. 
Furthermore it may happen that variables with multiple occurrences in the original formula have disappeared from 
the output formula, or appear only once. Hence the procedure can be iterated until no single-occurrence variables are 
present. We are left with F 2 , the largest subformula (of the original formula) where every variable appears at least 
twice. 

Many questions can be asked: how many equations are left in F 2 ? how many variables does it involve? how many 
solutions does it have? Giving the answers requires a thorough analysis of the removal procedure, with the techniques 
exposed in Section IV El ItI [26l [46j . The outcome depends on the value of the ratio compared to 

a d = rain - l ° E ^ ~ h ' ~ 0.8184 . . . (42) 
b 3 b 2 

hereafter called clustering threshold. With high probability when N — ► oo F 2 is empty if a < ad, and contains an 
extensive number of equations, variables when a > a<j. In the latter case calculation of the first and second moments 
of the number of solutions of F 2 shows that this number does not fluctuate around the value e ArSc!l ' 3t,!, '( Q -' +0 ( Ar ) where 

Sdusteria) = (6 - 3a b 2 + 2a b 3 ) In 2 (43) 

and b is the strictly positive solution of the self-consistent equation 



l-b = er 3ab2 



(44) 
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FIG. 7: Entropies (base 2 logarithms divided by size N) of the numbers of solutions and clusters as a function of the ratio 
a. The entropy of solutions equals 1 — a for a < a c ~ 0.918. For a < aj ~ 0.818, solutions are uniformly scattered on the 
iV- dimensional hypercube. At ay the solution space discontinuously breaks into disjoint clusters. The entropies of clusters, 
Sduster, and of solutions in each cluster, Si„, are such that s c iuster + Si n = s. At q c the number of clusters stops being 
exponentially large (s c iuster = 0). Above a c there is almost surely no solution. 

Hence F2 is satisfiable if and only if a < a c defined through s c i us ter{ot c ) = 0, that is, 

a c ~ 0.9179... . (45) 

This value is, by virtue of the equivalence between F and F2 the Sat/Unsat threshold for 3-XORSAT, in excellent 
agreement with Figure [6l 

How can we reconstruct the solutions of F from the ones of F2I The procedure is simple. Start from one solution 
of F2 (empty string if a < ad)- Then introduce back the last equation which was removed since it contained n > 1 
single-occurrence variable. If n = 1 we fix the value of this variable in a unique way. If n = 2 (respectively n = 3) 
there are 2 (respectively, 4) ways of assigning the reintroduced variables, defining as many solutions from our initial, 
partial solution. Reintroduction of equations one after the other according to the Last In - First Out order gives us 
more and more solutions from the initial one, until we get a bunch of solutions of the original formula F. It turns out 
that the number of solutions created this way is e JVs i»( a )+°C) where 

Sin (a) = (1 - a) In 2 - s cluster (a) . (46) 

The above formula is true for a > ay, and should be intended as Si n (a) — (1 — a) In 2 for a < ad- These two entropies 
are shown in Figure [7] The total entropy, s*(a) = Sj„(a) + s c /„ ster (a), is simply (1 — a) ln2 for all ratios smaller 
than the Sat/Unsat threshold. It shows no singularity at the clustering threshold. However a drastic change in the 
structure of the space of solutions takes place, symbolized in the phase diagram of Figure [8) 

• For ratios a < ad the intensive Hamming distance between two solutions is, with high probability, equal to 
d = 1/2. Solutions thus differ on N/2 + o(N) variables, as if they were statistically unrelated assignments of the 
N Boolean variables. In addition the space of solutions enjoys some connectedness property. Any two solutions 
are connected by a path (in the space of solutions) along which successive solutions differ by a bounded number 
of variables. Losely speaking one is not forced to cross a big region prived of solutions when going from one 
solution to another. 

• For ratios a > ad the space of solutions is not connected any longer. It is made of an exponentially large (in 
N) number M c iu = e N Sc!t " ter of connected components, called clusters, each containing an exponentially large 
number Mm — e N Sin of solutions. Two solutions belonging to different clusters lie apart at a Hamming distance 
d c iu = 1/2 while, inside a cluster, the distance is di n < d c i u . b given by (|44p is the fraction of variables having 
the same value in all the solutions of a cluster (defined as the backbone). 

We present in Sections IIIII and IIVI statistical physics tools developed to deal with the scenario of Figure [5] 

III. ADVANCED METHODS (I): REPLICAS 

A. From moments to large deviations for the entropy 

The analysis of Section IIIFI has shown that the first, and second moments of the number Af of solutions are 
dominated by rare formulas with a lot of solutions. Let us define the intensive entropy s through JV = e N s . As JV is 
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FIG. 8: Phase diagram of 3-XORSAT. A 'geometrical' phase transition takes place in the satisfiable phase at aj ~ 0.818. At 
small ratios a < ad solutions are uniformely scattered on the N- dimensional hypercube, with a typical normalized Hamming 
distance d = J. At a,i the solution space discontinuously breaks into disjoint clusters: the Hamming distance di n ~ 0.14 
between solutions inside a cluster is much smaller than the typical distance d c i u — i between two clusters. 



random (at fixed a,N) so is s. We assume that the distribution of s can be described, in the large size limit, by a 
rate function uj(s) (which depends on a). Hence, 

(Af q ) = J dse- N "W x (e Ns ) q ~exp[N max (qs-uj(s))] (47) 

using the Laplace method. If we are able to estimate the leading behaviour of the q th moment of the number of 
solutions when N gets large at fixed a, 

(AT»)~e Wfl W, (48) 

then uj can be easily calculated by taking the Legendre transform of g. In particular the typical entropy is obtained 
by s* = ^(q -> 0). This is the road we will follow below. We will show how g(q) can be calculated when q takes 
integer values, and then perform an analytic continuation to non integer q. The continuation leads to substantial 
mathematical difficulties, but is not uncommon in statistical physics e.g. the q — ► 1 limit of the q-state Potts model 
to recover percolation, or the n — > limit of the O(n) model to describe self- avoiding walks. 

To calculate the q th moment we will have to average over the random components of formulas F, that is, the K- 
uplets of index variables in the first members and the v = 0, 1 second members. Consider now homogeneous formulas 
Fh whose first members are randomly drawn in the same way as for F, but with all second members v = 0. The 
number Mh of solutions of a homogeneous formula is always larger or equal to one. It is a simple exercise to show 
that 

(N q+1 ) = 2 Ar ( 1 -") x ((J\fh) 9 ) , (49) 

valid for any positive integer g[z3|. Therefore it is sufficient to calculate the moments of Afh = e N 9h ^ since (|49)) 
gives a simple identity between g(q + 1) and g%{q). This technical simplification has a deep physical meaning we will 
comment in Section TlV CI 



B. Free energy for replicated variables 



The q power of the number of solutions to a homogeneous system reads 



X 



M q 
X 1 ,X 2 ,...,Xi 1=1 a=l 



(50) 



where ei{X) is 1 if equation £ is satisfied by assignment X. The last sum runs over q assignments X a , with a = 
1,2, ... ,q of the Boolean variables, called replicas of the original assignment X. It will turn useful to denote by 
Xi = (xj,xf, . . . ,xf) the (/-dimensional vector whose components are the values of variable X{ in the q replicas. To 
simplify notations we consider the case K = 3 only here, but extension to other values of K is straightforward. 
Averaging over the instance, that is, the triplets of integers labelling the variables involved in each equation £, leads 
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to the following expression for the q th moment, 

(W) = E <lW a )> 



M 



X',X 2 X' o=l 



E 



X 1 ,X' 2 ,....Xi 



1 

iV3 



E 

l<i,j,k<N 



o 



M 



(51) 



where <5# = 1 if the compoments of x are all null mod. 2, and otherwise. We now procede to some formal 
manipulations of the above equation (|5ip . 

First step. Be X = {AT 1 , A 2 , . . . , A 9 } one of the 2 qN replica assignment. Focus on variable i, and its attached 
assignment vector, Xi. The latter may be any of the 2 q possible vectors e.g. Xi — (1, 0, 1, 0, 0, ... , 0) if variable X{ is 
equal to in all but the first and third replicas. The histogram of the assignments vectors given replica assignment 
X, 



1 N 



(52) 



counts the fraction of assignments vectors Xi having value x when i scans the whole set of variables from 1 to N. Of 
course, this histogram is normalised to unity, 



E^ 



1 



(53) 



where the sum runs over all 2 q assignment vectors. An simple but essential observation is that the r.h.s. of l|51[) may 
be rewritten in terms of the above histogram, 



1 

N3 



E 5 s z +x ]+ x k = p( s ) p{^) p( s + f • 



l<i,j,k<N x,x' 

Keep in mind that p in (|52l54p depends on the replica assignement X under consideration. 



(54) 



Second step. According to (|54|) , two replica assignments X\ and X2 defining the same histogram p will give equal 
contributions to {{Mh) )■ The sum over replica assignments X can therefore be replaced over the sum over possible 
histograms provided the multiplicity M of the latter is taken properly into account. This multiplicity is also equal to 
the number of combinations of N elements (the Xi vectors) into 2 q sets labelled by x and of cardinalities N p{x). We 
obtain 



(norm) 

J- e NG h {{p},a)+o(N) 
{p} 



((■A4r>= ); e^WT^^ , (55) 
where the (norm) subscript indicates that the sum runs over histograms p normalized according to (|53p . and 



Gh({p},a) = - ^2p(x) lnp(x) +a In ^p(z) p(x') p{x + x" ) 



(56) 



In the large N limit, the sum in (|55|) is dominated by the histogram p* maximizing the functional Qh- 

Third step. Maximisation of function Qh over normalized histograms can be done within the Lagrange multiplier 
formalism. The procedure consists in considering the modified function 



Ql M ({p}, A, a) = Q h ({p}, a) + A ( 1 - £ p(x 



(57) 



and first maximise Qi M with respect to histograms p without caring about the normalisation constraint, and then 
optimise the result with respect to A. We follow this procedure with Qh given by (f56|) . Requiring that Gh M be 
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maximal provides us with a set of 2 q coupled equations for p* , 

]np*(g) + l + X-3a - — ; — =0, (58) 

2^ P* (%) P* {x") P* (x + x") 

x' ,x" 

one for each assignment vector x. The optimisation equation over A implies that A in (|58[l is such that p* is normalised. 
At this point of the above and rather abstract calculation it may help to understand the interpretation of the optimal 
histogram p* . 



C. The order parameter 

We have already addressed a similar question at the end of the second moment calculation in Section III Gl The 
parameter d* coming out from the calculation was the (weighted) average Hamming distance ([32]) between two 
solutions of the same random instance. The significance of p* is identical. Consider q' solutions labelled by a = 
1,2, ...,q' of the same random and homogeneous instance and a variable, say, a;,. What is the probability, over 
instances and solutions, that this variable takes, for instance, value in the first and fourth solutions, and 1 in all 
other solutions? In other words, what is the probability that the assignment vector a:, = (xj , x\ , . . . , xf ) is equal to 
f = (0, 1,1,0,1,1,...,!)? The answer is 

/ 1 M q \ 

p^=(tkw £ ^nn^ a )) (59) 

\ V ' Xi,X*,...,X<i' I=la=l / 

where the dependence on i is wiped out by the average over the instance. The above probability is an interesting 
quantity; it provides us information about the 'microscopic' nature of solutions. Setting q' = 1 gives us the probabilities 
p(0),p(l) that a variable is false or true respectively, that is, takes the same value as in the null assignment or not. 
For generic q' we may think of two extreme situations: 

• a flat p over assignment vectors, p(x') = l/2 q , corresponds to essentially orthogonal solutions; 

• on the opposite, a concentrated probability e.g. p(x') — Ss> implies that variables are extremely constrained, 
and that the (almost) unique solution is the null assignment. 

The careful reader will have already guessed that our calculation of the q th moment gives access to a weighted 
counterpart of p. The order parameter 

1 /M q \ 

U " ; ' X\X 2 ,...,Xi \Z=la=l / 

is not equal to p even when q = q' . However, at the price of mathematical rigor, the exact probability p over vector 
assignments of integer length q' can be reconstructed from the optimal histogram p* associated to moments of order 
q when q is real-valued and sent to 0. The underlying idea is the following. Consider (|60[) and an integer q' < q. 
From any assignment vector x of length q, we define two assignment vectors x',x" of respective lengths q',q — q' 
corresponding to the first q' and the last q — q' components of x respectively. Summing (|60[) over the 2 q ~ q assignment 
vectors x" gives, 

x" {J^ a } \ / 

As q now appears in the powers of Afh in the numerator and denominator only, it can be formally send to zero at 
fixed q' , yielding 

lim =p(3f) (62) 



from (|59[) . This identity justifies the denomination order parameter given to p* 
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Having understood the significance of p* helps us to find appropriate solutions to ([58]) . Intuitively and from the 
discussion of the first moment case q — 1, p is expected to reflect both the special role of the null assignment (which 
is a solution to all homogeneous systems) and the ability of other solutions of a random system to be essentially 
orthogonal to this special assignment. A possible guess is thus 

P(a?) = ^ + b6 g , , (63) 

where b expresses some degree of 'correlation' of solutions with the null one. Hypothesis (f6"3"| interpolates between the 
fully concentrated (b — 1) and flat (b — 0) probabilities, b measures the fraction of variables (among the N ones) that 
take the values in all q' solution, and coincides with the notion of backbone introduced in Section Hi Jl Hypothesis 
(f6"3"| is equivalent, from the connection (f6"2")l between p and the annealed histogram p* to the following guess for the 
solution of the maximisation condition (|58p , 

p *(S)= l —± + b8 s . (64) 

Insertion of Ansatz ([64]) in (|58|) shows that it is indeed a solution provided b is shrewdly chosen as a function of q and 
a, b = b* (q, a). Its value can be either found from direct resolution of (|58|) . or from insertion of histogram (|64|) in Qh 
(|56|) and maximisation over 6, with the result, 

gh{q,a) = max A h (b,q 7 a) (65) 

0<6<1 



where 



Mb, q,<*) = (l - ^) (1 ~ b) In (66) 
b H In lb H + a In lb' 



2 q ) \ 2i J \ 2« 

where the maximum is precisely reached in b* . Notice that, since p* in (|64[) is entirely known from the value of 6*, 
we shall indifferently call order parameter p* , or b* itself. 



D. Results 



Numerical investigation of Ah (|66p shows that: for a < chm{c[) the only local maximum of Ah is located in b* = 0, 
and Ah(q, a) = q(l — a) ln2; when ajvr(g) < a < a*(q), there exists another local maximum in b > but the global 
maximum is still reached in b* — 0; when a > a* (3), the global maximum is located in b* > 0. This scenario extends 
to generic q the findings of the second moment calculation carried out in Section III Fl The olm and a* lines divide 
the q, a plane as shown in Figure [5] Notice that, while the black dots in Figure [5] correspond to integer- valued q, the 
continuous lines are the output of the implicit analytic continuation to real q done by the replica calculation. 

Taking the derivative of l|65p with respect to q and sending q — > we obtain the typical entropy of a homogeneous 
3-XORSAT system at ratio a, 

s* h (a) - In 2 x max [(1 - 6)(l - m(l - b)) - a (1 - b 3 )] . (67) 

The optimal value for b coincides with the solution of (|4~4"1) . The typical entropy is plotted in Figure [TU1 and is equal 
to: 

• (1 — a) In 2 when a < a c ~ 0.918 (Figure^; in this range of ratios, homogeneous and full (with random second 
members) systems have essentially the same properties, with the same cluster organisation of solutions, and 
identical entropies of solutions. 

• a positive but rapidly decreasing function given by (|67[) when a > a c ; above the critical ratio, a full system has 
no solution any more, while a homogeneous instance still enjoys a positive entropy. The expression for s h (a) 
coincides with the continuation to a > a c of the entropy Si n (a) (|4"6"| of solutions in a single cluster for a full 
system. In other words, a single cluster of solutions, the one with the null solution, survive for ratios a > as in 
homogeneous systems. 
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FIG. 9: The q, a plane and the critical lines qa/(q) (dashed), a*(q) (full tick), and a s (q) (full thin) appearing in the calculation 
of the q th moment for homogeneous 3-XORSAT systems. Full dots correspond to integer q values, while continuous curves 
result from the analytic continuation to real q. The fraction of variables in the backbone, b* , vanishes below the line om(?); 
the global maximum of Ah in (|66|) is located in b* > for ratios a > a*(q). Ansatz (|64l) is locally unstable in the hardly visible 
domain q < 0, om (q) < a < a s (q). 



Atypical instances can be studied and the large deviation rate function for the entropy can be derived from (|65p for 
homogeneous systems, and using equivalence (|49p. for full systems. Minimizing over the entropy we obtain the rate 
function 0)3(0) associated to the probability that a random 3-XORSAT system is satisfiable, with the result shown 
in Figure ITTJl As expected we find 0J3 = for a < a c and ui^ > for a > a c , allowing us to locate the Sat/Unsat 
threshold. 

Notice that the emergence of clustering can be guessed from Figure [HI It coincides with the appearance of a local 
maximum of Ah (I66[) with a non vanishing backbone b. While in the intermediate phase ad < a < a c , the height 
of the global maximum equals the total entropy s*, the height of the local maximum coincides with the entropy of 
clusters s c i uste r (SSI). 



E. Stability of the replica Ansatz 

The above results rely on Ansatz (|64[) . A necessary criterion for its validity is that p* locates a true local maximum 
of Gh , and not merely a saddle-point. Hence we have to calculate the Hessian matrix of Gh in p* , and check that the 
eigenvalues are all negative @. Differentiating (j56"|) with respect to p(x) and p(x') we obtain the Hessian matrix 

-*s $x+x> p*(x + x') N(x) N(x') , . 

H(x, *) = + 6a _ 9a J-l , (68 ) 

where D = — h b 3 , N(x) = — h b 2 5$. We use b instead of b* to ligthcn the notations, but it is intended that 
b is the backbone value which maximizes Ah fl6"6")) at fixed q, a. To take into account the global constraint over the 
histogram (f53"| one can express one fraction, say, p(0), as a function of the other fractions p(x), i^O. Gh is now a 
fonction of 2 q — 1 independent variables, with a Hessian matrix H simply related to H , 

H(x, x') = H(x, x') - H(x, 0) - H(0, x') + H(0, 0) . (69) 

Plugging expression (|68[) into (Irj9^) we obtain 

H(x,x') = X B . &S+S' + 2<? — (Al - Xr) where 

/ h 11 h 4 

X L = 2 q 6a r - 9a(l - 2^) — 

V D (l-b)(l-b + 2ib) y ' D 2 

Diagonalization of H is immediate, and we find two eigenvalues: 

• Xl (non degenerate). The eigenmode corresponds to a uniform infinitesimal variation of p(x) for all x 7^ 0, that 
is, a change of b in (|64|) . It is an easy check that 

2 q d 2 A h 
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FIG. 10: Rate function u>3 for the probability of satisfaction of full (bottom curve) and entropy s£ of solutions for homogeneous 
(top curve) 3-XORSAT systems vs. a. The vertical dotted lines indicate the critical Sat/Unsat threshold, a c ~ 0.918. For 
a < a c u>3 — 0, and s% = (1 — a) In 2 is the same as for full systems. Above the threshold ui* < 0. Homogeneous systems are, 
of course, always satisfiable: the entropy s H is a positive but quickly decreasing function of a. 

where Ah is defined i n ([551) . As we have chosen b to maximize Ah this mode, called longitudinal in replica 
literature @, is stable |74j. 

• \r (2 q — 2-fold degenerate) : the eigenmodes correspond to fluctuations of the order parameter p transverse to 
the replica subspace described by (|64)l . and are called replicon in spin-glass theory [9(. Inspection of \r as a 
function of a, q shows that it is always negative when q > 0. For q < the replicon mode is stable if 

, s 1 - b 3 + 2i b 3 
a > aM = 6 6(1-6) • (72) 

which is a function of q only once we have chosen b — b*{q, a s ). 

The unstable region q < 0, olm{q) < a < a s ( q) is shown in Figure [9] and is hardly visible when q > —3. In this region 
a continuous symmetry breaking is expected [431 ] . In particular a s stay below the a* line for small (in absolute value) 
and negative q. We conclude that our Ansatz (|64p defines a maximum of Qh- 

Is it the global maximum of Qh 1 - There is no simple way to answer this question. Local stability does not rule out 
the possibility for a discontinuous transition to another maximum in the replica order parameter space not described 
by (f64|) . A final remark is that a similar calculation can be done for any value of K. The outcome for K = 2 is the 
rate function loi plotted in Figure in good agreement with numerics close to the threshold. 

IV. ADVANCED METHODS (II): CAVITY 

The cavity method, in the context of disordered systems, was historically developed as an alternative to the the 
replica method [43|. Its application to spin systems on random graphs is extensively explained in [44| . and we limit 
ourselves here to briefly show how it gives back the 3-XORSAT scenario of Section III J I [46j . 

Let us consider a system F involving variables Xi, i = 1, . . . , N. In the following we will indifferently use the variable 
Xi = 0, 1 or its spin representation Si = (— l) Xi = ±1 when convenient. Let us define the GS energy Ef {Si) of the 
system when the i th spin is kept fixed, that is, the minimal number of violated equations in F, taken over the 2 JV_1 
configurations. We may always write 

Ef{S i ) = -w i -h i S i , (73) 

where hi is called 'field' acting on spin Si. For a homogeneous system Ef (+1) = 0, and Ef (—1) = n, for some integer 
rii. Hence hi — 7J f takes half-integer values. 

The above definition can be extended to the case of I > 1 fixed spins. Let / c {1,2,..., N} be a subset of the 
indices of cardinal |/| > 2, and Si denote one of the 2^1 configurations of the spins Si, i £ I. The GS energy of F for 
given Si can in general be written as 

Ef {Si) = -wi-J2 h S t - Jl ' II 5 « ( 74 ) 

iei i'ci:\i'\>2 ier 



where the /i,s are the fields and the Jps are effective couplings between subsets of spins. 
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The basic cavity assumption is that effective couplings are vanishingly small: Jp = for every subset This 
apparently bold hypothesis critically relies on a general property of random graphs (from which our system is built 
on). Define the distance between two vertices as the minimal number of edges on pathes linking these two points. 
Then vertices in a finite subset are, with high probability when N — + oo, typically at infinite distance from each 
other [75|. When correlations between variables in GS extinguish with the distance i.e. when the correlation length is 
finite the cavity assumption is correct in the large iV limit [43|, [53| ■ The assumption will break down when correlations 
subsist on infinite distance, which happens to be the case in the clustered phase. 



A. Self-consistent equation for the fields 



Under the assumption that couplings between randomly picked up spins are null we are left with the fields only. 
The goal of this section is to show how to calculate those fields, or more precisely, their probability distribution. The 
derivation is based on the addition procedure already used in the calculation of the size of the giant component in 
random graphs (Section IIID[) . 

Consider a system F over N variables to which we want to add one equation involving one new variable S, and two 
variables Si, S2 appearing in F. The energy function associated to this equation is 

e(S,Si,S 2 ) = ^(l-aSSiS 2 ) (75) 

where a = +1, respectively —1, when the second member of the equation is 0, resp. 1. Let us calculate the GS energy 
of the new system F' = F + added equation when the new variable S is kept fixed, 

E F '(S)= min [e(S,Si,S2)+E[ 2 (Si,S 2 )] =-w-uS . (76) 

Sl,S2 

With the cavity hypothesis the couplings between spins Si,S2 is null and the minimization is straightforward. We 
deduce the following explicit expression for the field acting on S (called bias in the cavity literature [44|). 

u = ^ signal h 2 ) ■ (77) 

Suppose we now add i > 1 (and not only one) equations. The above calculation can be easily repeated. The absence 
of couplings make the total field acting on 5 a linear combination of the fields coming from each new equation, 

l 

&=xy\ (78) 

J'=l 

where m j is calculated from ([77)) and each pair of fields (h{, h 3 2 ) acting on the spins in the j th equation, j = 1, . . . ,1. 

How many equations should we add for our new system over N + 1 variables to have the same statistical features 
as old one over N variables? First £ should be Poisson distributed with parameter 3a. Then, given I, we randomly 
chose i pairs of variables; for each pair the corresponding bias u can be calculated from ([77j. Assume the output is a 
set of I independent biases, taking values 

{+j with probability a + 
with probability a (79) 
— 5 with probability a_ 

Obviously a + + a + a_ = 1 . Summing over the equations as in (|78[) we obtain the distribution of the field h acting 
on the new spin at fixed I, 



P(h\l) = Yl £ 0j £_) a + a °- 5 h-W+-l-) 



(80) 



Finally we sum over the Poisson distribution for £ to obtain the distribution of fields h, 
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In turn we calculate the distribution of the biases from the one of the fields through (|77|) . The outcome are the values 
of the probabilities (|79[) in terms of p, 

a+ = ^2 V(hi)p(h2) , a_ = ^2 P{hi)p(h,2) , 

hi,h2-hih2>0 hi,h2-hih2<0 

a Q = £ P(hi)p(h 2 ) = 2p(0)~p(0) 2 . (82) 

hi,h2'.hih2=Q 

The above equations together with (|8"Tj) define three self-consistent conditions for ao,a + ,a_. Notice that the free 
energy can be calculated along the same lines [44| . 



B. Application to homogeneous and full systems 

In the case of homogeneous systems (er = +1) we expect all the fields to be positive, and look for a solution of 1(82]) 
with a_ = 0. Then p(h) (|8ip is a Poisson distribution for the integer- valued variable 2h, with parameter 3aa+. The 
self-consistent equation (|82|) reads 

a = 1 - a+ = 2 e~ 3aa + - e~ 6aa + = 1 - (l - e - 3aa +) 2 (83) 
which coincides with (|4"4")) with the definition b = v / a+. As expected 

(84) 



is the fraction of frozen variables (which cannot be flipped from to 1 in GS assignments), in agreement with the 
notion of backbone of Section III Jl 

The energy is zero at all ratio a by construction. As for the entropy consider adding a new equation to the system 
F (but with no new variable) . With probability 1 — b 3 at least one of the three variables in the new equation e was not 
frozen prior to addition, and the number of solutions of the new system F + e is half the one of F. With probability 
b 3 all three variables are frozen in F (to the zero value) and the number of solutions of F + e is the same as the one 
of F. Hence the average decrease in entropy is 

Ns* h (a + 1) -Ns* h (a) ~ ^ = -(1 - b 3 ) In 2 . (85) 

The same differential equation can be obtained by differentiating (|67| . With the limit condition s£(a — > oo) = we 
obtain back the correct expression for the average entropy of homogeneous systems. The entropy is equal to (1 — a) In 2 
at a — a c , and becomes smaller when the ratio decreases. This shows that the solution 6 = must be preferred in 
this regime to the metastable b > solution. We conclude that the cavity assumption leads to sensible results for 
homogeneous systems at all ratios a. 

In full systems the sign a entering ([77|) takes ±1 values with equal probabilities. We thus expect p(h) to be an even 
distribution, and a + = a_ = ^(1 — ao). Remark that a solution with ao < 1 cannot exist in the satifiable phase. It 
would allow two added equations to impose opposite non zero biases to the new variable i. e. to constraint this variable 
to take opposite values at the same time. Given ag we calculate from (|5Tj) the probability that the field vanishes, 



P (Q) =e -3a(l-a )^ 



3a n 
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and, in turn, derive from (|82[) a self-consistent equation for ao- Numerical investigations show that ao = 1 is the 
unique solution for a < ax — 1.167. When a > ax there appears another solution with ao < 1. The clustering 
and Sat/Unsat transitions are totally absent. This result, incompatible with the exact picture of random 3-XORSAT 
exposed in Section Hi Ji shows that the simple cavity hypothesis does not hold for full systems. 



C. Spontaneous symmetry breaking between clusters 



In the clustered phase variables are known to be strongly correlated and the cavity assumption has to be modified. 
Actually from what we have done above in the homogeneous case we guess that the independence condition still holds 
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if we can in some way restrict the whole space of solutions to one cluster. To do so we explicitely break the symmetry 
between clusters as follows (iM [59|. 

Let S*, i = 1, . . . , N be a reference solution of a full satisfiable system F, and Fh the corresponding homogeneous 
system. We define the local gauge transform Si — > = Sj x S* . {S} is a solution of F if and only if {S} is a solution 
of -Ffr,. As the cavity assumption is correct for the homogeneous system we obtain the distribution of fields hi > 
from ([51]) . Gauging back to the original spin configuration gives us the fields 

hi = S* x hi . (87) 

It turns out that the above fields depend only on the cluster to which belong the reference solution. Indeed for the 
fraction 1 — b of the non frozen spins, hi = hi = 0. For the remaining fraction b of spins in the backbone hi > | and 
S* has a unique value for all solutions in the cluster ( Section III J[) . Hence the fields hi are a function of cluster (c) 

containing {S*}, and will be denoted by h\ . 

What modification has to be brought to the cavity assumption of Section [IV Al is now clear. Given a subset I of the 
spins with configuration Si we define ^(SV) as the GS energy over configurations in the cluster (c). Then the cavity 

assumption is correct (spins in / are uncorrelated) and Ei^ define the fields h^ . How do we perform this restriction 
in practice? A natural procedure is to break the symmetry between clusters in an explicit manner by adding a small 
coupling to the reference solution [Hil. [59j. Remark that symmetry was broken (naturally but explicitly!) in the case 
of homogeneous systems when we looked for a distribution p(h) with support on positive fields only. It is a remarkable 
feature of XORSAT (rather unique among disordered systems) that symmetry between disordered clusters can be 
broken in a constructive and simple way. 

The main outcome of the above discussion is that the field attached to variable i is not unique, but depends on the 
cluster (c). We define the distribution Pi(h) of the fields attached to variable i over the clusters (with uniform weights 
since all clusters contain the same number of solutions) [5fjj . The naive cavity assumption corresponds to 

Pi(h)=Sh-tH- (88) 

In presence of many clusters Pi{h) is not highly concentred. From (|8T[) and the fact that S* — ±1 depending on the 
cluster from which we pick up the reference solution we find that 

Pi(h) = l[5 h _ hi +5 h+k ] . (89) 

As hi is itself randomly distributed we are led to introduce the distribution V of the field distributions Pi(h). This 
mathematical object, V(jp(K)), is the order parameter of the cavity theory in the clustered phase [4^.[50j. 



D. Distribution of field distributions 

Let us see how V can be obtained within the one-more variable approach of Section [IV Al A new equation contains 
two variables Si, Sj from F, with fields h\ , h^ in each cluster (c). The bias u is a deterministic function of those 
two fields for each cluster ([77)) . We define its distribution over clusters p. As u can take three values only and p is an 
even distribution due to the randomness of the second member of the new equation we may write 

p[u) = (1 - ipy) 5 u + ^f (6 u _i + 5 u+ i) . (90) 

The weight tp is a random variable which varies from pair (ij) to pair. 

What is the probability distribution p(tp) of tp? Either the two variables in the pair belong to the backbone and 
they are frozen in all clusters; then u will be non zero and tp = 1. Or one (at least) of the two variables is not frozen 
and u = in all clusters, giving tp = 0. We may write 

P(tp) = (1 -w) Sp + wSp-i ■ (91) 

From the above argument we expect w = b 2 . Let us derive this result. 

Assume we add £ > 1 equations to our system. For each one of those equations a bias u J is drawn randomly 
according to distribution (|90p . Denote by m(< t) the number of those equations with parameter tp = 1; m is 
bmomiaily distributed with probability w among t. Then £ — m biases are null, and m biases are not equal to zero. 
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For the formula to remain satisfiable the non-zero biases must be all positive or negative see Section [IV Bl Hence 
the distribution of the field on the new variable is 

p m {h)= l -[5 h ^+5 h+ ~] , (92) 

in agreement with the expected form ([89]). The upperscript m underlines that field distributions with non zero 
probability are can be labelled by an integer m; they define a countable set and the distribution V can be defined as a 
discrete probability V m over the set of positive integers m. The probability V m of distribution (|92p is the convolution 
of binomial distribution for m at fixed I with the Poisson distribution over £, 

j-,m _ c -3 aw $ OC w) m 

ml 

Identities (|92I93[) fully determine the distribution of field distributions in term of a single parameter, w. 

To close the self-consistency argument consider the two variables in F in, say, the first added equation. Call h, h! 
their fields, distributed according to p m (h),p m i(h') for some m, m'. The bias created onto the new variable will be 
non zero if h and h! may both take non zeros value in some clusters, that is, if m and m! are not equal to zero. This 
translates into the mathematical identity 

w= VmV m , = (l-V ) 2 = (l-e~ 3aw ) 2 (94) 

from ([9"5|). The above equation coincides with (|4"4"|) for w = b 2 . Notice that w is equal to the probability a + that the 
bias is non zero in the homogeneous case ([79]) , in agreement with the discussion of Section [IV CI 

It is easy to find back the expressions for the entropies of clusters, s c iuster, and solutions in a cluster, Sj n , given 
in Section III Jl As for the latter entropy the argument leading to ([55]) can be repeated, with the modification that 
the second member of the added equation is not necessarily zero but the value it should have for the equation to be 
satisfied when all three variables are frozen. Hence ([85]) holds with s£ replaced with Sj n . As for the entropy of clusters 
the same argument again tells us that, on average, half of the clusters will disappear when the three variables are 
frozen and the second member of the equation is randomly chosen. Therefore 

dSd^ter = _ b 3 ]n2j (95) 

da 

in agreement with equations (|43I44|) . Summing differential equations ([95]) and (|85|) for s c iuster and Si n respectively 
shows that the total entropy of solutions is (1 — a) In 2 (Section III J[) . 



V. DYNAMICAL PHASE TRANSITIONS AND SEARCH ALGORITHMS 



The deep understanding of the statistical properties of 3-XORSAT makes this problem a valuable benchmark for 
assessing the performances of various combinatorial search algorithms. At first sight, the idea seems rather odd since 
3-XORSAT is a polynomial problem. Interestingly most of the search procedures devised to deal with NP-complete 
problems e.g. SAT have poor performances i.e. take exponentially long average running times on XORSAT above 
some algorithmic-dependent critical ratio ... The purpose of this Section is to present two algorithms exhibiting such 
a dynamical phase transition, and the techniques required for their analysis. 

A. Random WalkSAT (RWSAT): definition, worst-case bound 

The first algorithm we consider is the Random WalkSAT (RWSAT) algorithm introduced by Papadimitriou [58l |. 
RWSAT is based on the observation that a violated equation can be satisfied through negation of one of its variables: 

• Start from a randomly chosen configuration of the variables. Call energy the number E of 

UNSATISFIED EQUATIONS. 

• While E>1; 

- PICK UP UNIFORMLY AT RANDOM ONE OF THE E UNSATISFIED EQUATIONS; 
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FIG. 11: Fraction e of unsatisfied equations as a function of time t (number of steps divided by N) during the operation of 
RWSAT on a random 3-XORSAT formula at ratio a = 0.2 (left) and a = 0.4 (right) with N = 10 3 (dotted), 10 4 (dashed), 
10 (full curve) variables. Note the difference of horizontal scales between the two figures. Inset: blow up of the t G [9.5; 10.5] 
region; the amplitude of fluctuations around the plateau decreases with increasing size N. 

- pick up uniformly at random one of its 3 variables; 

- negate the value of this variable, update e\ 

• Print 'Satisfiable', and Halt. 

Notice that, as a result of the negation, some equations that were satisfied may become violated. Therefore the 
energy is not guaranteed to decrease with the number of steps of the algorithm. RWSAT is able to escape from local 
minima of the energy landscape, and is a priori capable of better performances. From the other hand, RWSAT may 
run forever... A major question is how long should the algorithm be running before we stop thinking that the studied 
system has solutions hard to find and get some confidence that there is really no solution. 

This question was addressed by Schoning [61], who showed that RWSAT could easily be used as a one-sided 
randomized algorithm 54] [76J . Consider one instance of 3-XORSAT and run RWSAT for 3N steps from a randomly 
chosen configuration of variables. Choose again a random initial configuration and run RWSAT another 3N steps, 
and so on ... The probability that no solution has been found after T repetitions of this procedure though the formula 
is satisfiable is 

PSAT<ex P l-Tx (-) 1 . (96) 

Hence we obtain a probabilistic proof that the instance is not satisfiable if the algorithm has run unsuccessfully for 
more than (^) N sets of 3N steps. It must be clear that this result holds for any instance, no assumption being made 
on the distribution of formulas. The probability appearing in (|96|) is on the random choices done by RWSAT and the 
choices of the restart configurations for a fixed formula. 

The proof of (|9"6"|) can be sketched as follows. Assume that the formula is satisfiable, and called X* one of its 
solutions. Consider now the (extensive) Hamming distance between the solution and the configuration X of variables 
produced by RWSAT at some instant. After each step only one variable is changed so D changes into D + 1 (bad 
move) or D — 1 (good move). Call x, y, z the variables in the equation which was not satisfied by X. One or three of 
those variables have opposite values in X*. In the latter case the flip is always a good move; in the former case the 
good move happens with probability i and a bad move with probability |. On the overall the probability of a good 
move is ^ at least. 

Think of D has the position of a random walker on the [0; N] segment. Initially the position of the walker is a 
binomial variable, centered in At each step the walker moves to the left with probability i, and to the right with 
probability |. We look for the probability p that the walker is absorbed by the boundary D = after S steps. A 
standard calculation shows that p is maximal for S — 3N, with the value p ~ {j) N ■ After T repetitions the probability 
of not having been absorbed is (1 — p) T < exp(— pT), hence (|96|) . The proof can be easily extended to i^-XORSAT 
with higher values of K. The number of repetitions necessary to prove unsatisifiability scales as ( 2 ^ J ^T 1 - > ) jv ; it is 
essentially equal to 2 N for large K, showing that RWSAT does not beat exhaustive search in this limit. 



B. Dynamical transition of RWSAT on random XORSAT instances 

Result (|96p is true for any instance; what is the typical situation for random systems? Numerical experiments 
indicate that there is critical value of the ratio of equations per variables, ~ 0.33, hereafter referred to as dynamical 
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FIG. 12: Fraction e p i ateau of unsatisfied equations on the plateau (A) and logarithm r res of the average resolution time divided 
by N (B) as a function of the ratio a of equations per variable. Diamonds are the output of numerical experiments, and 
have been obtained through average of data from simulations over 1,000 systems and runs of RWSAT for various sizes N, and 
extrapolation to N — > oo 62]. Full lines are theoretical approximations (|105p . (|110p . 



threshold, separating two regimes: 

• for a < ole j RWSAT generally finds a solution very quickly, namely with a number of flips growing linearly with 
the number of variables AT (77]. Figure [PT1 shows the plot of the fraction e of unsatisfied clauses as a function 
of the time (number of steps) T for one randomly drawn system with ratio a = 0.2 and N = 500 variables. 
The curve shows a fast decrease from the initial value (e(T = 0) = \ independently of a for large values of N, 
but deviations can be found at small sizes, see Figure [Tl]) down to zero on a time scale of the order of N 78}. 
The resolution time T res depends both on the system of equations under consideration and the choices of the 
algorithm; its average value scales as 

(T res ) = Nt res +o(N) . (97) 
where t res is an increasing function of a[79(. 

• for systems with ratios of equations per variable in the cue < ot < ot c range, the initial relaxation regime taking 
place on the O(N) time scale does not allow RWSAT to reach a solution (Figure fTTB). The fraction e of unsat 
equations then fluctuates around some plateau value e v \ a teau for a very long time. Fluctuations are smaller 
and smaller (and the height of the plateau better and better defined) as the size N increases. As a result 
of fluctuations, the fraction e of unsatisfied equations may temporarily either increase or decrease. When a 
fluctuation happens to drive RWSAT to e = 0, a solution is found and the algorithm stops. The corresponding 
resolution time, T res , is stochastic; numerical experiments for different sizes N indicate that its expectation 
value scale as 

(T res ) = exp{NT res + o(N)) . (98) 

where the coefficient r res is an increasing function of a. The plateau energy e p i a teau and the logarithm r res of 
the resolution time are shown in Figure 1121 

Notice that the dynamical threshold cie above which the plateau energy is positive is strictly smaller than the 
critical threshold a c ~ 0.918, where systems go from satisfiable with high probability to unsatisfiable with high 
probability. In the intermediate range ue < a < a c , systems are almost surely satisfiable but RWSAT needs an 
exponentially large time to prove so. The reason is that RWSAT remains trapped at a high energy level (plateau of 
Figure [T2"]) for an exponentially large time. The emergence of metastability can be qualitatively studied with simple 
tools we now expose. 



C. Approximate theory for the metastable plateau and the escape time 

Assume that after T steps of the algorithm the energy (number of unsatisfied equations) is Et > 1. Then pick up 
an unsatisfied equation, say, C, and a variable in C, say, x, and flip it. The energy after the flip is 



Et+i — — 1 — + S , 



(99) 
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where S (respectively U) is the number of equations including x which were satisfied (resp. unsatisfied after exclusion 
of equation C) prior to the flip. S and U are random variables with binomial distributions, 



Et-1-U 



M-E-t 



where the probabilities are intended over the formula content. Taking the average evolution equation we obtain 

(E T+1 ) = (E T ) - 1 - A«£ T ) - 1) + |(M - (E T )) . (101) 

The above equation is exact. It is now tempting to iterate it with time, from the initial condition {Et=q) = 4r- This 
is what we do hereafter but one should realize that this procedure is not correct from a mathematical standpoint. 
The catch is that one is allowed to average over the formula only once, and certainly not at each time step of the 
algorithm. Evolution equation (|101|) amounts to redraw randomly the instance at each time step, conditioned to the 
energy. This approximation nevertheless allows us to write down a simple equation for (Et), which captures much of 
the true behaviour of RWSAT. 

The next step in our analysis is the large size, large time limit. As the energy can typically change by a quantity of 
the order of unity in one time step we expect the fraction of unsatisfied equations to vary of a time scale of the order 
ofN, 

(Et) = M e(J^=t\ , (102) 

for some smooth function e(t) of the reduced time t. Finite difference equation pOip turns into a differential equation 
after insertion of (|102p . 

de 

— = -1 + 3a(l - 2e) , (103) 
at 

with the initial condition e(0) = i. Clearly (|103[) makes sense as long as e > 0; if e vanishes the algorithm stops. 
Resolution of (|103[) shows the following scenario. If a is smaller than 

OLE = ~ , (104) 

the fraction e of unsatisfied equations quickly decreases, and vanishes at some time t res {a). This regime corresponds 
to a successfull action of RWSAT in a 0(N) number of steps. t res is an increasing function of a which diverges as 
a — > a.E- Above this critical ratio e shows a different behaviour: after a decreasing transient regime e saturates to a 
positive plateau value 



(a) = i(l-^). (105) 



a / 

The value of the plateau energy is compared to numerics in Figure I12A . The agreement on the location of the 
dynamical threshold as well as the plateau energy are satisfactory. 

The remaining point is to understand how RWSAT finally finds a solution when a > a^. The above theory, based 
on taking the N — > oo limit first, washes out the fluctuations of the energy around its metastable value, of crucial 
importance for resolution [62l |. To take into account these fluctuations let us define the probability Q p i a teau{E) that 
the energy takes value E in the plateau regime of Figure [TTB . A stationary distribution is well defined if we discard 
the initial transient regime (choose large t) and collect values for E on exponentially large-in-Af time scales. The 
procedure is standard in the study of long-time metastable states. 

Within our draw-instance-at-each-step approximation we may write a self-consistent equation for the stationary 
distribution of energies, 

Qplateau(E) = ^ Proba[[/] Pn>ba[S] Q P lateau(E + 1 + U - S) (106) 

u.s 
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where the meaning of U, S was explained right after ([99)) . From Section IV Bl we expect fluctuations to decreases 
sharply with the system size. A reasonable guess for the scaling of the distribution with M is 



Iplateau(E) = exp 



(107) 



where to is the rate function associated to the fraction of unsatisfied equations. Plugging the above Ansatz into (| 106[) 
and taking the large M limit we find that ui fulfills the following differential equation 

-0. (108) 

where F (x, y) — iay{e^ x — l) + 3a(l — y)(e x — 1) — x. This equation has to be solved with the condition u (e p i a teau) = 0. 

An analytical solution can be found for (|108[) when we restrict to the vicinity of the dynamical transition i. e. to 
small values of u>. Expanding F to the second order in its first argument and solving (|108[) we obtain 

uj(e) ~ 2 (e - e p iateau) 2 , (109) 

where e p i a teau is defined in (|105[) . 

What happens when time increases is now clear. Assume we have run RWSAT up to time t ~ e Mr . Then 
configurations with energy e such that w(e) < t have been visited many times and are 'equilibrated' with probability 
P06p . (|109p . Configurations with energies outside the band e p i a t ea u ± \/t/2 are not accessible. When the time scales 
reaches 

r res = u(0) * J ( 1 - , (110) 



2 V 3a, 

zero energy configurations are encountered, and RWSAT comes to a stop. The agreement between the theoretical 
estimate (|110[) and the numerical findings (f9"5| visible in Figure [T2"B is acceptable in regard to the crudeness of the 
approximation done. 



D. Davis-Putnam-Loveland-Logemann (DPLL) algorithm 

The second procedure is the Davis-Putnam-Loveland-Logemann (DPLL) algorithm 22]. Contrary to RWSAT DPLL 
can provide exact proofs for unsatisfiability. The procedure, widely used in practice, is based on the trial-and-error 
principle. Variables are assigned according to some heuristic rule (split step), and equations involving those variables 
simplified. If an equation involving a single variable (unit-equation) appears its variable is chosen accordingly prior 
to any other heuristic assignment (unit-propagation) . If a contradiction is found (two opposite unit-equations) DPLL 
backtracks to the last heuristically assigned variable, flips it, and resumes the search process. The procedure halts 
either when all equations have been satisfied (a solution is then found), or when all possible values for the variables 
have been tried in vane and found to be contradictory (a proof of unsatisfiability is then obtained) . 

DPLL can be described as a recursive function of the variable assignment A. Given a system S DPLL is first called 
with the empty assignment A = 0: 

Procedure DPLL [A] 

• Let Sa be what is left from S given variable assignment A; 

• if Sa is empty, Print 'Satisfiable'; Halt; 

• If Sa contains a violated equation, Print 'Contradiction', Return; (backtracking) 

• Otherwise, let U be the set of unit-equations in Sa; 

- If U ^ 0, pick-up one of the equations in U, say, e, and CALL DPLL[AU{e}]; (unit-propagation) 

- IF U = 0, CHOOSE A NOT- YET-ASSIGNED VARIABLE, SAY, X, AND ITS VALUE V ACCORDING TO SOME 

heuristic rule, and call DPLL[AU{a; = v}}, then DPLL[AU{a; = v}]; (variable splitting) 
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FIG. 13: Median number of splits required by DPLL with the UC (left) and GUC (right) heuristics as a function of the ratio 
a, and for N = 20,40,60 variables (from bottom to top). Data have been extracted from the resolution of 10,000 randomly 
drawn systems; continuous lines are guidelines for the eye. Note the difference of (logarithmic) scale between UC and GUC 
curves showing that DPLL-GUC is much more efficient than DPLL-UC. The polynomial/exponential transition is located at 
ratios oe = | and as = 0.7507... for UC and GUC respectively. 



Rules for assigning variables in the absence of unit-equations are heuristic in that they aim at doing good assumptions 
i.e. diminishing as much as possible the search process to come from limited information about the current system of 
equations. Of course, perfect heuristic do exist: trying all possible values for not-yet-assigned variables would ensure 
that no wrong guess is ever done! But the time required would be exponentially long. In practice, heuristics have to 
make their decision in polynomial time. Two simple splitting heuristics are: 

UC: choose at random and uniformly any unset variable, and assign it to or 1 with equal probabilities (^). 

GUC: choose at random and uniformly any equation with minimal length i.e. involving 2 variables if any, or 
3 variables otherwise. Pick up at random and uniformly one its variable, and assign it to or 1 with equal 
probabilities (|). 

UC, which stands for unit-clause [l4j, amounts to make a random guess and is the simplest possible heuristic. GUC 
(Generalized UC) is more clever: each time a split is done from an equation with 2 variables, this equation is turned 
into a unit-equation, and eliminated through unit-propagation. In the following, we call DPLL-UC and DPLL-GUC 
the variants of DPLL based on the UC and GUC heuristics respectively. 

A measure of the computational effort required by DPLL is the number T sp u t of variable splittings. This number 
varies from system to system (at fixed number N of variables and ratio a), and from run to run of DPLL due to 
the stochasticity introduced by the heuristic rule. The outcome of numerical experiments for the median number of 
splits [8fJ. For a given size N T sp ut shows a maximum located around a ~ a c . If one fixes a T sp ut is an increasing 
function of the size A; numerical data support the existence of a dynamical threshold, a#, separating linear and 
exponential scalings in N, 

J Nt sp i it + o(N) if a < a. E , . 

ls ? ut ~ \ exp(Ar spW + o(N)) if a > a E ' 

where t sp ut and T sp ut are functions of the ratio a. The value of the dynamical threshold can be derived from theoretical 
calculations shown in Section IV El and is equal to Qe = | and cxe — 0.7507... for UC and GUC heuristics respectively. 
Three dynamical regimes are therefored identified [2|, |lfj : 

• Linear & satisfiable phase (a < as)'- systems with small ratios are solved with essentially no backtracking. A 
solution is found after O(N) splits. 



Exponential satisfiable phase (ag < a < a c ): systems with ratios slightly below threshold have solutions, but 
DPLL generally requires an exponential number of splits to find one of them. An explanation for this drastic 
breakdown of performances will be given in Section IV El 

Exponential & unsatisfiable phase (a > a c ): finally, finding a proof of unsatisfiability typically requires an ex- 
poncntiallv large number of splits |l5j|. Note that, as a gets higher and higher, each variable assignment affects 
more and more equations (of the order of a), and contradictions are detected earlier and earlier. Rigorous 
calculations show that T sp nt ~ ^[10], and the computational effort decreases with increasing a (Figure IT5|) . The 
median number of splits is considerably smaller for DPLL-GUC than for DPLL-UC, a result expected from the 
advantages of GUC against UC discussed above. 
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FIG. 14: Phase diagram of 2 + p-XORSAT and dynamical trajectories of DPLL. The threshold line a c (p) (bold full line) 
separates sat from unsat phases. Departure points for DPLL trajectories are located on the 3-XORSAT vertical axis with 
ratios .4, |,.8, 1. from bottom to top. The arrow indicates the direction of motion along trajectories parametrized by the 
fraction t of variables set by DPLL. For small ratios a < oie (= § for the UC heuristic) trajectories remain confined in the sat 
phase, end in S of coordinates (0, 0), where a solution is found. At oe the trajectory hits tangentially the threshold line in T 
of coordinates (j> f )• When a > oe the trajectories intersect the threshold line at some point G (which depends on a), and 
stops before hitting the cid{p) dotted line (| 122[) . After massive backtracking DPLL will find a solution; G corresponds to the 
highest node in the search tree. 



E. Linear phase: resolution trajectories in the 2 + p-XORSAT phase diagram 



Action of DPLL on an instance of 3-XORSAT causes changes to the numbers of variables and equationses, and 
thus to the ratio a. Furthermore DPLL turns equations with 3 variables into equation with 2 variables. A mixed 
2 + p-XORSAT distribution, where p is the fraction of 3-equations and a the ratio of the total number of 2- and 3- 
equations over the number of variables can be used to model what remains of the input system [8lj|. Repeating the 
calculations of Section ITTT1 for the 2 + p-XORSAT models we derive the phase diagram of Figure [Ml The Sat/Unsat 
critical line a c (p) separates the satisfiable from the unsatisfiable phases. For p < po = \ i.e. to the left of point T, the 
threshold line coincides with the percolation transition as in the 2-XORSAT model, and is given by a c (p) = 2 {i-p) ■ 
For p > po an intermediate clustered phase is found as in the 3-XORSAT model, and the threshold coincides with the 
vanishing of the cluster entropy s c i us ter (Section IIIjp . 

The phase diagram of 2+p-XORSAT is the natural space in which DPLL dynamic takes place. An input 3-XORSAT 
instance with ratio a shows up on the right vertical boundary of Figure [14] as a point of coordinates (p = 1, a). Under 
the action of DPLL the representative point moves aside from the 3-XORSAT axis and follows a trajectory, very much 
alike real-space renormalization, which depends on the splitting heuristic. Trajectories enjoy two essential features 
First the representative point of the system treated by DPLL does not 'leave' the 2+p-XORSAT phase diagram. 
In other words, the instance is, at any stage of the search process, uniformly distributed from the 2+p-XORSAT 
distribution conditioned to its equation per variable ratio a and fraction p of 3-equations. This assumption is not 



true for all heuristics of split, but holds for UC and GUC[l4|[82|. Secondly, the trajectory followed by an instance in 



the course of resolution is a stochastic object, due to the randomness of the instance and of the assignments done by 
DPLL. In the large size limit (N — ► oo) the trajectory becomes self-averageing i.e. concentrated around its average 
locus in the 2+p-XORSAT phase diagram [63j]. We will come back below on this concentration phenomenon. 

Let ao denote the equation per variable ratio of the 3-XORSAT instance to be solved. We call Ej (T) the number 
of j-equations (including j variables) after T variables have been assigned by the solving procedure. T will be 
called hereafter 'time', not to be confused with the computational effort. At time T = we have £"3(0) = aoN, 
£2(0) = -E-i(O) = 0. Assume that the variable x assigned at time T is chosen through unit-propagation, that is, 
independently of the j-equation content. Call rij(T) the number of occurrences of x in j-equations (j = 2,3). The 
evolution equations for the populations of 2-, 3-equations read 

E 3 (T + 1) = E 3 {T) - n 3 (T) , E 2 {T + 1) = E 2 (T) - n 2 (T) + n 3 {T) . (112) 

Flows n 2 , n 3 are of course random variables that depend on the instance under consideration at time T, and on the 
choice of variable done by DPLL. What are their distributions? At time T there remain N — T untouched variables; 
x appears in any of the Ej(T) j-equation with probability pj = N 3 _ T , independently of the other equations. In the 

large N limit and at fixed fraction of assigned variables, t = j^, the binomial distribution converges to a Poisson law 
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equation density e=E/N 




fraction l=T/N 



FIG. 15: Deterministic versus stochastic dynamics of the equation population E as a function of the number of steps T of the 
algorithm. On the slow time scale (fraction t = T/N) the density e = E/N of (2- or 3-) equations varies smoothly according 
to a deterministic law. Blowing up of the dynamics around some point t',e' shows the existence of small and fast fluctuations 
around this trajectory. Fluctuations are stochastic but their probability distribution depends upon the slow variables t' , e' only. 



with mean 



where 



(113) 



l-t N 
is the density of ^-equations at time T. The key remark is that, when N — ► oo, ej is a slowly varying and non 
stochastic quantity and is a function of the fraction t = rather than T itself. Let us iterate (|112|) between times 
T =tN and T + AT where 1 -C AT <C N e.g. AT = 0(VW) . Then the change AT 3 in the number of 3-equations is 
(minus) the sum of the stochastic variables n,j (T) for T = Tq, Tq + 1, . . . , Tq + AT. As these variables are uncorrelated 
Poisson variables with O(l) mean (|113[) AT3 will be of the order of AT, and the change in the density e 3 will be of 
order of AT/A — > 0. Applying central limit theorem AT3/AT will be almost surely equal to —(713)4 given by (|113|) 
and with the equation density measured at reduced time t. The argument can be extended to 2-equations, and we 
conclude that e2,e3 are deterministic (self-averaging) quantities obeying the two coupled differential equations [3| 

3 e 3 de 3 . . 3 e 3 2 e 2 



t 



Those equations, together with the initial condition e 3 (0) 

e 3 (t) = a (l-t) 3 , 



dt w l-t l-t 
a 0j e 2(0) = can be easily solved, 
e 2 (t) =3a t{l-t) 2 . 



(114) 



(115) 



To sum up, the dynamical evolution of the equation populations may be seen as a slow and deterministic evolution 
of the equation densities to which are superimposed fast, small fluctuations. The distribution of the fluctuations 
adiabatically follows the slow trajectory. This scenario is pictured in Figure [T5l 

Expressions (|115p for the equation densities allow us to draw the resolution trajectories corresponding to the action 
of DPLL on a 3-XORSAT instance. Initially the instance is represented by a point with coordinates (p = 1, a = ao) in 
Figure [T4l As more and more variables are assigned the representative point moves away from the rightmost vertical 
axis. After a fraction t of variables have been assigned the coordinates of the point are 



P(t) = 



l-t 



e 2 + e 3 l + 2t 



a(t) 



62 + e 3 
l-t 



a (l-t)(l + 2t) 



(116) 



Trajectories corresponding to various initial ratios are shown in Figure [Ml For small ratios a® < ue trajectories 
remain confined in the sat phase, end in S of coordinates (0,0), where a solution is found. 



At a E {- 

2> 



for the UC 



heuristic), the single branch trajectory hits tangentially the threshold line in T of coordinates (j, |). When ao > &e 
the trajectories enter the Unsat phase, meaning that DPLL has turned a satisfiable instance (if ag < a c ) into an 
unsatisfiable one as a result of poor assignments. It is natural to expect that is the highest ratio at which DPLL 
succeeds in finding a solution without resorting to much backtracking. 



F. Dynamics of unit-equations and universality 



The trajectories we have derived in the previous Section are correct provided no contradiction emerges. But 
contradictions may happen as soon as there are E\ — 2 unit-equations, and are all the more likely than E\ is large. 
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E'i Ei E? 

FIG. 16: Evolution of the number E\ of 1-equations as one more variable is assigned. ni denotes the number of 2-equations 
reduced to 1-equations, s\ the number of 1-equations satisfied. If E\ > 1 a variable is fixed through unit-propagation: E\ 
decreases by one plus si, and increases by n-z. In the absence of unit-equation (E\ — 0) the number of 1-equations after the 
assignment is simply E[ = 712 ■ 



Actually the set of 1-equations form a 1-XORSAT instance which is unsatisfiable with a finite probability as soon as 
Ei is of the order of >/N from the results of Section III Al Assume now that E\ (T) -C N after T variables have been 
assigned, what is the probability pt that no contradiction emerges when the T th variable is assigned by DPLL? This 
probability is clearly one when E\ = 0. When E\ > 1 we pick up a 1-equation, say, x% = 1, and wonder whether the 
opposite 1-equation, xq = 0, is present among the [E\ — 1) 1-equations left. As equations are uniformly distributed 
over the set of N — T untouched variables 

. \ max (.Ei (T)- 1,0) 



• (117) 

The presence of the max in the above equation ensures it remains correct even in the absence of unit-equations 
(£1 = 0). E\(T) is a stochastic variable. However from the decoupling between fast and slow time scales sketched 
in Figure [T5l the probability distribution of E\(T) depends only on the slow time scale t. Let us call n(E\\t) this 
probability. Multiplying ()1 17[) over the times T = to T = N— 1 we deduce the probability that DPLL has successfully 
found a solution without ever backtracking, 

Psuccess = exp ^ mC#i;*) i E i ~ !) ) ( 118 ) 

in the large N limit. 

We are left with the calculation of fi 29]. Figure [16] sketches the stochastic evolution of the number E\ during one 
step. The number of 1-equations produced from 2-equations, n.2, is a Poisson variable with average value, from (|115[) . 

d(t) = ^± = 6a t(l-t) (119) 

when N — > 00. The number of satisfied 1-equations, s\, is negligible as long as E\ remains bounded. The probability 
that the number of 1-equations goes from E\ to E[ when T —* T + 1 defines the entry of the transition matrix 

M(E[,E 1 ;t) = J2 e- d W^^6 E[ _ {El+n2 _s El) ■ (120) 

from which a master equation for the probability of E\ at time T may be written. On time scales 1 <C AT <C N this 
master equation converges to the equilibrium distribution \x [l6l [29| , conveniently expressed in terms of the generating 
function 

G {x -t) = £ , {El -t) = ( i; rf( y ( ) l K : ) : i 1 ) . m 



Bi>0 



The above is a sensible result for d(t) < 1 but does not make sense when d(t) > 1 since a probability cannot be negative! 
The reason is that we have derived (|121|) under the implicit condition that no contradiction was encountered. This 
assumption cannot hold when the average rate of 1-equation production, d(t), is larger that one, the rate at which 
1-equations are satisfed by unit-propagation. From (|119p we see, when a > ole = §, the trajectory would cross the 

MP) = ^ (122) 



on which d — 1 for some time tr> < 1. A contradiction is very likely to emerge before the crossing. 
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When a < cue d remains smaller than unity at any time. In this regime the probability of success reads, using 
QUI and (fT2T1) . 



3a 



Psuccess — exp . . 

4 Z V i — 3a 



3q 



tanh 1 



3o: 



2 -3a 



(123) 



Psuccess is a decreasing function of the ratio a, down from unity for a = to zero for a = a^. The present analysis 
of the UC heuristic can be easily transposed to the GUC heuristic. Details are not given here but can be found in 
The result is an expression for p SU ccess larger than its UC counterpart (|123[) . and vanishing in a# ~ 0.7507. 
Interestingly the way p success vanishes when a reaches cue-, 



- In p. 



success 



(ole — e) ~ e 



0+) 



(124) 



is the same for both heuristics. This similarity extends to a whole class of heuristics which can be described by 
the flow of equation densities only and based on unit-propagation 25]. The probability that DPLL finds a solution 
without backtracking to a 3-XORSAT instance of size N satisfies finite-size scaling at the dynamical critical point, 



- hi p. 



success 



(a B -e,A)~ A^ $(eiV3) 



(125) 



where the scaling function $ is independent of the heuristics and can be calculated exactly [25|]. The exponent 
characterizing the width of the critical region is the one associated to percolation in random graphs (|38|) . A consequence 
of (|125p is that, right at Oe, Psuccess ~ exp(— Cst x Ne ) decreases as a stretched exponential of the size. The value of 
the exponent, and its robustness against the splitting heuristics can be understood from the following argument [25[. 
Let us represent 1- and 2- equations by a graph G over the set of A — T vertices (one for each variable Xi) with 



E\ marked vertices (one for each unit-equation Xi — 0, 1), and E% signed edges (a;* 



0, 1), see Section ITTDl d 



is simply the average degree of vertices in G. Unit-propagation corresponds to removing a marked vertex (and its 
attached edges), after having marked its neighbours; the process is iterated until the connected component is entirely 
removed (no vertex is marked). Meanwhile, new edges have been created from the reduction of 3-equations into 
2-equations. Then a vertex is picked up according to the heuristic and marked, and unit-propagation resumes. The 
success/failure transition coincides with the percolation transition on G: d = 1 as expected. From random graph 
theory [l2j the percolation critical window is of width \d — 1| ~ A -1 / 3 . As d is proportional to the ratio ao (|119|) 
we find back ip = i. The time spent by resolution trajectories in the critical window is At ~ \J\d — 1| ~ A -1 / 6 , 
corresponding to AT = A At ~ TV 5 / 6 eliminated variables. As the largest components have size S ~ N 2 / 3 the number 
of such components eliminated is C — AT/S ~ A 1 / 6 . What is the probability q that a large component is removed 
without encountering a contradiction? During the removal of the component the number of marked vertices 'freely' 
diffuses, and reaches E\ ~ \/~S ~ TV 1 / 3 . The probability that no contradiction occurs is. from (|117[) . a ~ (1 — ^^) ElxS , 
a finite quantity. Thus p SU ccess 
this scaling. 



q c ~ exp(-A 1 / 6 ). The presence of numerous, smaller components does not affect 



G. Exponential phase: massive backtracking 

For ratios ao > cue DPLL is very likely to find a contradiction. Backtracking enters into play, and is responsible 
for the drastic slowing down of the algorithm (Figure [T3]) . 

The history of the search process can be represented by a search tree, where the nodes represent the variables 
assigned, and the descending edges their values (Figure IT?]). The leaves of the tree correspond to solutions (S), or to 
contradictions (C). The analysis of the a < ctE regime leads us to the conclusion that search trees look like Figure [TTR 
at small ratios [83J. Consider now the case of unsatisfiable formulas (ao > a c ) where all leaves carry contradictions 
after DPLL halts (Figure [TTC). DPLL builds the tree in a sequential manner, adding nodes and edges one after 
the other, and completing branches through backtracking steps. We can think of the same search tree built in a 
parallel way [Till]. At time (depth T) our tree is composed of L(T) < 2 T branches, each carrying a partial assignment 
over T variables. Step T consists in assigning one more variable to each branch, according to DPLL rules, that is, 
through unit-propagation or split. Possible consequences are: emergence of a contradiction and end of the branch, 
simplification of the attached formulas and the branch keeps growing. 

The number of branches L(T) is a stochastic variable. Its average value can be calculated as follows [5l|. Let us 
define the average number L(E;T) of branches with equation populations E = (Ei, E<i, E3) at depth T. Initially 
L(E;0) — 1 for E — (0,0, aoA), otherwise. Call M(E',E;T) the average number of branches with population E' 
generated from a branch with population E once the T th variable is assigned. Transition matrix M is an extension 
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FIG. 17: Search trees in three regimes of Section TV Dl A. linear, satisfiable (a < Qb); B. exponential, satisfiable (ole < a < a c ); 
C. exponential, unsatisfiable (a > a c ). Leaves are marked with S (solutions) or C (contradictions). G is the highest node to 
which DPLL backtracks, see Figure [14] 



of (|120|) to the whole population vector E and not only E\. We have < M < 2, the extreme values corresponding 
to a contradiction and to a split respectively. We claim that 

L(E'; T+l) = Y^ M(E', E; T) L{E; T) . (126) 

E 

Evolution equation (|126|) is somewhat suspicious since it looks like the approximation (1106j) we have done in the 
analysis of RWSAT. Yet a major difference exists which makes (|126p exact [5l|. Drawing randomly many times the 
same instance, as we are doing, is in principle forbidden but not along one branch for the very reason the analysis of 
Section IV El was correct. Actually what we have done in Section IV El is to draw randomly at time T the equations 
containing the T th variable. But this is correct since those equations are immediately simplified into shorter equations 
and their remaining content remains unknown [33[ ■ The situation seems more complicated in the case of the whole 
tree since the same equation can appear at different depth along distinct branches. Indeed the number of branches 
produced from two distinct branches after assignment of one variable are correlated variables. But thanks to the 
linearity of expectation those correlations do not matter and ()126|) is correct. 

Transition matrix M can be explicitely written down. It is more convenient to write (|126[) for the generating 
function of the number of branches, B(x; T) = ^2gL(E; T)xf x x 2 2 x 3 3 , with the result 

B{x; T + 1) = T) + (2 — 1)B(0, /a, h; T) - 2B(0; T) , (127) 

Ji Ji 

where / is the vector with components 

f L Hj f , 2(x 2 -x 1 ) 3(x 2 -x 3 ) 

The three terms on the r.h.s. of (|127p correspond, from left to right: unit-propagation (the branch keeps growing), 
variable splitting (2 branches are created from the previous one), branches carrying empty instances (satisfied in- 
stance). Equation (|127|) together with the initial condition B(x;0) = x^° N completely defines the average dynamics 
of the search tree. We sketch the main steps of its resolution below[l6|: 

1 . To count the number of branches irrespectively of the number of unit-equations we should consider the value 
X\ = 1. However, as long as branches grow the number E\ of unit-equations cannot be large, and remains 
bounded. We can therefore choose X\ = | which simplifies (| 12T[) without affecting the large size scaling of L 
and B. This technical trick is reminiscent of Knuth's kernel method [3^ |. 

2. For large iV it is reasonable to expect that the number of branches grows exponentially with the depth, or, 
equivalently, 

L{E 1 ,E 2 ,E 3 ; T) - e N K^,e 2 ;t)+o{N) ( 129 ) 
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where e2, are the densities of equations as usual. From point 1 the Legendre transform of A 

7(0:2, X3;t) — max [A(e2, e2;i) + ei lnx2 + ez In 2:3] (130) 

eo,es 



fulfills the partial differential equation (PDE) 

dl , „ 1 — 2x 2 c^7 3(^2 — £3) c?7 . „ . 

-I=ln2 + — — -^4- V2 - — 131 

with the initial condition 7(2:2, Xa;t) = ag UIX3. 

3. The first order PDE can be solved exactly with the characteristic method. The output, after Legendre inversion 
through (|130p . is the entropy A(e2,e3;t) of branches at reduced depth t. Let us call X*(t) the maximum value 
of A over the equation densities for a fixed fraction t of assigned variables. 

4. X*(t) is a function growing from A* = at t = 0, reaching a maximum value A^ in t;vf, and decreasing for larger 
times t < 1. ijvf is the depth in the tree of Figure [TTC where most contradictions are found; the number of C 
leaves is, to exponential order, e JVA * f . We conclude that the size of the tree we were looking for is 

Tsplit = X* M , (132) 

compare with (|111|) . For large a 3> ot c one finds T sp ut ~ In 2/ (6a) in agreement with fioj ]. The calculation can 
be extended to highers values of K . 

The above calculation holds for the unsatisfiable, exponential phase. How can we understand the satisfiable but 
exponential regime < a < a c l The resolution trajectory crosses the Sat/Unsat critical line at some point G 
shown in Figure [TJ] Immediately after G the instance left by DPLL is unsatisfiable. A subtree with all its leaves 
carrying contradictions will develop below G (Figure [TTJ3) . The size rf pUt of this subtree can be easily calculated from 
the above theory. The only change is the initial condition over 7: 7(2:2, 2:3; 0) = o.g{pg + (1 — Pg) In 2:2) where 
(pG,cta) are the coordinates of G which can be calculated from ao and the knowledge of the critical Sat/Unsat line. 
Once this subtree has been built DPLL backtracks to G, flips the attached variable and will finally end up with a 
solution. Hence the (log of the) number of splits necessary will be typically equal to T sp ut — (1 — £g) T %ut 03- 

VI. CONCLUSIONS 

Previous Sections have allowed us to illustrate rather general techniques and ideas to deal with random systems. 
It does not come as a surprise that other problems than XORSAT e.g. the satisfaction of Boolean constraints, graph 
coloring, the covering of vertices, ... have been successfully studied with these tools. Many of those problems, when 
given an input distribution based on random graphs, actually share a lot of common features with XORSAT. The 
reader is referred to 

@, El HI HI EE HI (satisfiability), [Hill (coloring), 0|| (vertex cover), ... for entry 
points to the literature. Let us also mention that many other interesting optimization problems, not directly related 
to random graphs, have been studied with the techni ques of Sections 4 and 5, and the results sometimes rigorously 
proven e.g. matching 0, [41], [56[, traveling salesman [42l |. number partitioning [H, H^, graph partitioning [3(j, ... 
Finally, from a historical point of view, one should not forget that statistical mechanics tools have found numerous 
and beautiful applications in the study of the learning and storage properties of neural networks [H, [53], all the more 
so the random satisfiability problem can be recast as an Ising perceptron problem [35| . 

The study of random optimization problems is obviously interesting from a probabilistic point of view. As far as 
computer science is concerned they can be seen as useful benchmarks for testing and improving resolution procedures. 
A successful example is the traduction of the cavity equations of Section 5 into an algorithm for solving given instances 
of the satisfiability problem [45] . This algorithm, called Survey Propagation, extends to the clustered phase the Belief 
Propagation procedure of wide-spread use in statistical inference, and is a very efficient procedure to find solutions 
to 3-Satisfiability slightly below threshold. Another application of statistical physics ideas is the conception of new 
heuristics for DPLL capable of proving the unsatisfiability of formulas with 700 hundreds variables at threshold [24| . 

Despite those successes important question remain open. First is there a relationship between clustering and hard- 
ness of resolution? This question is reminiscent of a very general issue in statistical physics, namely the relationship 



between dynamical and static properties of disordered or glassy systems [21|. The onset of clustering, or more precisely 
of strong correlations between variables over the space of solutions drastically worsens the performances of sampling 
algorithms e.g. Monte Carlo procedures [36|, [53|. However, in practical applications, one looks for a solution rather 
than for the sampling of the solution space... From this point of view knowing whether solutions are clustered or not 
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does not seem to be of crucial relevance. Actually a local and polynomial search strategy capable of finding solutions 
well above the clustering threshold has been explicitely found for various optimizations problems (37j . 

Another open question is what happens at large K, that is, when constraints involve more and more variables. The 
performances of all known algorithms, be they local search procedures or DPLL solvers, seem to deteriorate. Worst- 
case bound indicate that the large K case is very difficult [32j ■ From statistical mechanics point of view problems look 
like more and more the random energy model 43|as K increases, but can we beat the worst-case bounds on average? 
Finally let us mention a recent work by Feige [281 ] which, for the first time, showed that the complexity of solving 
random SAT (or XORSAT) model had a fundamental interest in worst-case approximation theory. Consider 3-SAT 
instances with ratio a ^> a c . Most of them have GS energy close to aN/2, but a very tiny fraction of those instances 
have energy smaller than, say, eN where e <C a is fixed. Is there a polynomial algorithm capable of recognizing all 
such atypical formulas from the vast majority of typical instances? Insights from statistical physics suggest that, the 
answer is positive for SAT (if we want most satisfiable instances to be detected and not all of them) while XORSAT 
seems to be much harder^! Actually, to the knowledge of the author, no local search algorithm (based on random 
walk, variable assigment, Monte Carlo, message-passing, cooling procedure, ...) is efficient for solving XORSAT. This 
makes the study of this problem even more valuable from a computer science point of view. 



APPENDIX A: A PRIMER ON LARGE DEVIATIONS 

Large deviation theory is the field of probability which deals with very unlikely events [231 ] . You are given a fair 
(unbiased) coin and toss it N times. The number H of head draws has probability 

p»W = w{h)- (A1) 

When N gets large H is highly concentrated around H* — N/2 with small relative fluctuations of the order of 0(>/~N). 
Yet we can ask for the probability of observing a fraction ft = H/N equal to say, 25%, of heads, far away from the 
likely value ft* — 50%. To calculate this probability we use Stirling's asymptotic expression for the binomial coefficient 
in (|A1|) to obtain 

PN {H = hN) = e - N »W+°( N ) , (A2) 

where 

uj(h) = ln2 + ftlnft+(l- ft) ln(l - ft) (A3) 

is called rate function. The meaning of (|A2j) is that events with value of ft ^ ft* are exponentially rare in N, and u)(h) 
give the decay (rate) exponent. The answer to our question is e ~ NuJ (.- 25 ) ^ e -o.i3W w jj en jy j s l ar g e . Some comments 
are: 

• ui(h) is strictly positive, except in ft = ft* = i where it vanishes. This is the only value for the fraction of head 
draws with non exponentially small— in— N probability. 

• Let ft = ft* + Sh where Sh is small. Using oj(h*) — u>'[h*) — we have 

P N (H = (h* + Sh)N) =exp[- N^u;"{h*){8h) 2 + ...] , (A4) 

that is, Sh is Gaussianly distributed with zero mean and variance (Nu"{h*))- 1 = (AN)- 1 . Hence central limit 
theorem is found back from the parabolic behaviour of the rate function around its minimum [84|. 

• lu is here a convex function of its argument. This property is true rate functions describing independent events. 
Indeed, suppose we have H positive (according to some criterion e.g. being a head for a coin) events among a 
set of N events, then another set of N' events among which H' are positive. If the two sets are uncorrelated 

PN+N , (H + H') > PN (H) x p N , (H') (A5) 

since the same total number H + H' of positive events could be observed in another combination of N + N 1 
events. Taking the logarithm and defining ft = H/N, ft' = H' /N, u = N / (N + N') we obtain 

w(uft+ (1 - u)ti) <uuj(h) + (l-u)w(h') , (A6) 



for any u € [0; 1]. Hence the representative curve of u> lies below the chord joining any two points on this curve, 
and u! is convex. Non-convex rate functions are found in presence of strong correlations [85j . 
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APPENDIX B: INEQUALITIES OF FIRST AND SECOND MOMENTS 

Let Af be a random variable taking values on the positive integers, and call p_\f its probability. We denote by (Af) 
and (Af 2 ) the first and second moments of (assumed to be finite), and write 

p(Af>l)= PAA-l-Po (Bl) 

^■=1,2,3,... 

the probability that Af is not equal to zero. Our aim is to show the inequalities 

^<p(Af>l)<{Af) . (B2) 
The right inequality, call 'first moment inequality', is straightforward: 

M N>l M>1 

Consider now the linear space made of vectors v = (vq, V\, V2, ■ ■ ■} whose components are labelled by positive integers, 
with the scalar product 

v ■ v' = ^2pfs v' M . (B4) 
M 

Choose now v^f = Af, and v' = 0, v'j^ = 1 for Af > 1. Then 

v • v = (Af 2 ) , v • v' = (Af) , v' • v' = P {Af > 1) . (B5) 
The left inequality in (|B2|1 is simply the Cauchy-Schwarz inequality for v, v': (v • v') 2 < (v • v) x (v' • v'). 

APPENDIX C: CORRECTIONS TO THE SADDLE-POINT CALCULATION OF (Af 2 ) 

In this Appendix we show that (Af 2 ) is asymptotically equivalent to (Af) 2 , where Af is the number of solutions of 
a 3-XORSAT formula with ratio a < cti ~ 0.889. This requires to take care of the finite-size corrections around the 
saddle-point calculations of Section III Fl Let Z = (z\, Z2, ■ ■ ■ , zn) denotes a configuration of variables at distance d 
from the zero configuration i.e. dN variables Zj are equal to 1, the other (1 — d)N variables are null. Let q(d,N) be 
the probability that Z satisifies the equation Zi + zj + Zk = where (i,j,k) is a random triplet of distinct integers 
(unbiased distribution): 



q(d,N) = -i- 
= Q(d) 



(CI) 

/ h(d)\ , 6d(2d~l) 



and q(d) is defined in (|26| with K = 3. Terms of the order of iV~ 2 have been discarded. 

Using formula ([2T]) with q(d) substituted with q(d,N) and the Stirling formula for the asymptotic behaviour of 
combinatorial coefficients we have 

(Af 2 ) - V e NA(d, a )+ a h(d) ( C2 n 

V 2 V2^Nd^/2nN(l - d) ' 

a — u j iv > 7f )••• 

where A(d,a) is defined in ((28]), and ~ indicates a true asymptotic equivalence (no multiplicative factor omitted). 
The r.h.s. of (|C2|) is the Riemann sum associated to the integral 

(Af 2 ) - ^ NM e NA(d, a )+ah(d) C C3 ) 

Jo yj2TiNd(l - d) 

We now estimate the integral through the saddle-point method. For a < ct2 — 0.889 the dominant contribution to the 
integral comes from the vicinity of d* = \ . There are quadratic fluctuations around this saddle-point, with a variance 
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equal to N times the inverse of (the modulus of) the second derivative Add of A with respect to d. Carrying out the 
Gaussian integral over those fluctuations we obtain 



since h(d*) = 0, Add(d* , a) = —4. Therefore, from the second moment inequality, Psat — * 1 when N — > oo at ratios 
smaller than a.2- 
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